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Abstract 

The  approach  of  modeling  measured  signals  as  superimposed  exponentials  in 
white  Gaussian  noise  is  popular  and  effective.  However,  estimating  the  parameters 
of  the  assumed  model  is  challenging,  especially  when  the  data  record  length  is  short, 
the  signal  strength  is  low,  or  the  parameters  are  closely  spaced. 

In  this  dissertation,  we  first  review  the  most  effective  parameter  estimation 
scheme  for  the  superimposed  exponential  model:  maximum  likelihood.  We  then 
provide  a  historical  review  of  the  linear  prediction  approach  to  parameter  estimation 
for  the  same  model.  After  identifying  the  improvements  made  to  linear  prediction 
and  demonstrating  their  weaknesses,  we  introduce  a  completely  tractable  and  sta¬ 
tistically  sound  modification  to  linear  prediction  that  we  call  iterative  generalized 
least  squares.  It  is  shown,  that  our  algorithm  works  to  minimize  the  exact  maximum 
likelihood  cost  function  for  the  superimposed  exponential  problem  and  is  therefore, 
equivalent  to  the  previously  developed  maximum  likelihood  approach.  However,  our 
algorithm  is  indeed  linear  prediction,  and  thus  revives  a  methodology  previously 
categorized  as  inferior  to  maximum  likelihood. 

With  our  modification,  the  insight  provided  by  linear  prediction  can  be  carried 
to  actual  applications.  We  demonstrate  this  by  developing  an  effective  algorithm 
for  deep  level  transient  spectroscopy  analysis.  The  signal  of  deep  level  transient 
spectroscopy  is  not  a  straight  forward  superposition  of  exponentials.  However,  with 
our  methodology,  an  estimator,  based  on  the  exact  maximum  likelihood  cost  function 
for  the  actual  signal,  is  quickly  derived.  At  the  end  of  the  dissertation,  we  verify  that 
our  estimator  extends  the  current  capabilities  of  deep  level  transient  spectroscopy 
analysis. 
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PARAMETER  ESTIMATION  FOR  SUPERIMPOSED 


WEIGHTED  EXPONENTIALS 


/.  Introduction 

1.1  Problem 

This  dissertation  is  focused  on  the  superimposed  weighted  exponential  parame¬ 
ter  estimation  problem.  Many  processes  throughout  applied  science  and  engineering 
are  modeled  by  this  paradigm,  e.g.  (59,  23,  2,  32,  45).  For  example,  in  high  frequency 
radar  applications,  a  reflecting  body  of  interest  can  be  approximated  as  a  collection 
of  independent  scattering  centers.  Under  this  assumption,  the  electromagnetic  fields 
associated  with  each  scattering  center  are  modeled  in  the  frequency  domain  as  com¬ 
plex  exponentials,  and  the  net  scattering  from  the  body  is  considered  to  be  the 
phasor  sum  of  all  the  individual  scattering  centers  (59).  Estimates  of  the  parame¬ 
ters  of  this  model,  when  applied  to  actual  radar  signals,  are  used  for  radar  imaging 
and  analysis.  Improved  accuracy  in  the  estimates  directly  improves  resolution  and 
increases  capabilities. 

Another  use  of  the  superimposed  exponential  model  is  seen  in  the  analysis  of 
signals  associated  with  the  free  induction  decay  of  nuclei  in  nuclear  magnetic  reso¬ 
nance  (NMR)  experiments.  Both  the  concepts  of  NMR  spectroscopy  and  magnetic 
resonance  imaging  (MRI)  are  predicated  on  the  assumption  that  a  volume  of  sample 
contains  a  finite  number  of  different  types  of  nuclei.  Each  type  of  nucleus  is  asso¬ 
ciated  with  the  unique  parameters  of  a  complex  exponential,  and  the  total  signal 
emanating  from  the  volume  is  considered  to  be  a  finite  sum  of  these  complex  expo- 
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nentials  (23:342).  Estimating  the  model  parameters  is  essential  to  identifying  the 
composition  of  samples  in  NMR  spectroscopy  or  rendering  an  image  in  MRI. 

Our  research,  in  particular,  is  motivated  by  a  class  of  applications  interested 
in  identifying  the  parameters  of  signals  modeled  by  the  sum  of  real  decaying  expo¬ 
nentials.  Signals  of  this  nature  arise  in  many  physics,  chemistry,  biophysics,  and 
biochemistry  experiments  (2).  Accurate  parameter  estimates  can  convey  important 
information  about  molecular  structure,  reaction  rates,  and  physical  make  up  of  the 
specimen. 

Typically,  exponential  or  multi-exponential  signals  are  measured  in  experi¬ 
ments  involving  some  type  of  perturbation.  An  external  forcing  parameter  is  pulsed 
or  stepped  on  to  a  system,  previously  in  an  equilibrium  state,  and  the  resulting  tran¬ 
sient  is  monitored.  For  example,  in  the  case  of  some  fluorescence  decay  experiments, 
an  exciting  lamp  flash  is  used  to  perturb  proteins,  protein  conjugates,  or  nucleic  acid 
conjugates  for  fluorimetric  observation  (32:1090).  In  relaxation  kinetics,  a  sudden 
change  in  a  chemical  system’s  temperature,  pressure,  or  electric  field  is  used  to  invoke 
a  concentration  change  in  one  or  several  species  of  the  system  (2:178).  Spectropho- 
tometric,  fluorimetric,  polarimetric,  or  conductometric  detection  methods  are  used 
to  monitor  the  transient  immediately  after  the  perturbation. 

A  specific  physics  experiment  that  requires  real  exponential  analysis  is  deep 
level  transient  spectroscopy  (DLTS).  DLTS  is  a  capacitance  transient  thermal  scan¬ 
ning  technique  used  to  characterize  defects  present  in  semiconductors  (45:3023). 
Good  characterization  of  the  defects  in  semiconductors  directly  impacts  semicon¬ 
ductor  manufacturing,  quality  assurance,  and  technical  advancement  in  general.  For 
these  reasons,  accurate  parameter  estimation  capability  in  the  DLTS  experiment  is 
critical.  In  the  latter  chapters  of  this  dissertation,  we  concentrate  on  the  real  ex- 
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ponential  parameter  estimation  problem  and  the  DLTS  experiment.  Ultimately,  we 
identify  improved  parameter  estimation  algorithms  created  for  DLTS. 

Before  developing  the  improved  algorithms,  we  review  the  applicable  contri¬ 
butions  that  have  already  been  made  for  superimposed  exponential  parameter  es¬ 
timation.  To  keep  the  review  as  general  as  possible,  the  derivations  and  examples 
presented  are  application  independent.  In  this  vein,  the  model  notation  is  now  in¬ 
troduced. 

1.2  Model 

We  assume  y[m]  is  an  observed,  complex  signal  plus  noise  element  from  a 
sequence  of  length  M  produced  by  N  superimposed  weighted  exponentials  with 
additive  white  Gaussian  measurement  noise, 

(1)  y[m]  =  s[m]  +  w[rn],  m  =  0, 1, ...,  M  —  1 


where 


(2) 


s  m  —  Cl  A™  “t-  C2^  -b  . . .  -f- 


Unless  specified  otherwise,  the  model  order  (or  number  of  modes),  N is  as¬ 
sumed  to  be  known  and  always  less  than  the  data  record  length,  M.  The  complex 
exponentials,  A„,  are  not  equal  to  zero  and  are  not  repeated,  i.  e.,  Xi  ^  Xj  for 
^  ^  i)  bi  =  1, 2, ...  The  complex  amplitude  coefficients,  c„,  are  also  not  equal 
to  zero.  Finally,  the  additive  measurement  noise  contributions,  w[m],  for  each  y[m\ 
are  complex  Gaussian  random  variables  uncorrelated  across  all  m  with  uncorrelated 
real  and  imaginary  components,  each  of  mean  zero  and  variance  a^/2. 
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Note,  the  Gaussian  noise  assumption  is  employed  to  scope  the  research.  Al¬ 
though  the  assumption  is  often  reasonable,  we  recognize  the  potential  for  it  to  be 
inappropriate.  Many  of  the  methodologies  developed  in  this  research  may  be  ex¬ 
tended  for  alternative  measurement  noise  assumptions,  but  we  defer  that  subject  to 
future  research. 

Also  note,  when  the  purely  real  signal,  y[m],  is  considered,  we  assume  the 
parameters  \n  and  c„  are  real.  Additionally,  the  additive  noise  contributions,  w[m], 
are  assumed  to  be  real  Gaussian  random  variables  uncorrelated  across  all  samples 
with  mean  zero  and  variance  o^. 

In  vector  notation,  the  observed  signal  and  noise  are  written  as 


(3) 

where 

y 

V 

v{Xn) 

c 

(4)  w 


y  =  Vc  +  w 


y[0]  y[l]  •  •  •  y[M  -  1] 


u(Ai) 

u(A2)  •  •  • 

'  v{Xn) 

1  A„ 

••• 

1 

1 _ 1 

T 

1 

T 

Cl  C2 

•  •  •  Cpf 

o 

u;[l]  ••• 

w\M  - 

-1] 

The  MxN  matrix  V  has  Vandermonde  structure.  In  Appendix  A  we  show  that 
with  distinct  exponentials,  A„,  the  columns  of  V  are  linearly  independent.  Therefore, 
with  M  >  N,  the  matrix  V  has  full  column  rank,  N. 
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1.3  Overview 


In  Chapters  II  and  III,  we  identify  two  competing  methodologies  for  estimating 
the  parameters  of  superimposed  weighted  exponentials:  maximum  likelihood  and  lin¬ 
ear  prediction.  Except  at  the  end  of  Chapter  III,  the  work  developed  in  Chapters  II 
and  III  is  background  information  already  presented  by  other  researchers.  In  Chap¬ 
ter  II,  we  review  the  maximum  likelihood  methodology  and  exalt  it  by  identifying 
its  relationship  to  the  Cramer-Rao  bound.  In  Chapter  III,  we  review  an  alterna¬ 
tive  methodology:  linear  prediction.  At  the  end  of  the  Chapter,  we  introduce  an 
extension  to  the  linear  prediction  methodology  that  allows  us  to  create  an  estimator 
equivalent  to  the  estimator  developed  from  maximum  likelihood.  This  original  find¬ 
ing  allows  us  to  simultaneously  utilize  the  insights  acquired  during  the  development 
of  linear  prediction  and,  in  general,  the  flexibility  of  the  linear  prediction  method, 
while  obtaining  an  estimator  with  performance  capabilities  approaching  the  Cramer- 
Rao  bound.  The  equivalence  of  the  two  estimators  is  shown  through  a  common  cost 
function  which  is  subject  to  minimization.  In  Chapter  IV,  a  detailed  review  of  tech¬ 
niques  for  minimizing  the  common  cost  function  is  given.  In  the  review,  we  introduce 
new  insight  into  the  minimization  problem  by  identifying  equivalent  minimization 
approaches  and  conditioning  issues  that  were  previously  overlooked.  In  Chapter  V, 
we  transition  to  an  analysis  for  the  restricted  case  of  superimposed  real  exponen¬ 
tials.  The  lessons  learned  in  the  previous  chapters  are  applied,  and  the  intricacies 
of  the  real  exponential  problem  are  considered.  Our  analysis  for  the  superimposed 
real  exponential  problem  is  completely  original  work  and  can  not  be  found  in  the 
literature. 

With  these  subjects  thoroughly  investigated,  our  knowledge  is  applied  to  the 
DLTS  application.  The  development  and  results  are  presented  in  Chapter  VI.  Be- 
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cause  of  our  preparation,  we  are  able  to  develop  estimators  for  DLTS  that  signifi¬ 
cantly  outperform  the  estimators  currently  in  use.  This  contribution  to  DLTS  has 
been  accepted  for  publishing  (31). 

In  Chapter  VII,  we  conclude  the  dissertation  by  reviewing  the  primary  and 
ancillary  contributions  of  our  research  and  addressing  areas  for  immediate  additional 
work. 
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11.  Maximum  Likelihood 


2.1  Chapter  Introduction 

A  parameter  estimation  method  is  coined  maximum  likelihood  (ML)  when  it 
works  to  maximize  the  probability  density  function  (PDF),  f{y',0),  associated  with 
the  observed  realization,  y,  and  the  assumed  signal  and  noise  model.  The  signal 
and  noise  model  is  parameterized  by  the  vector  6.  An  ML  estimate  of  6  is  rendered 
by  finding  the  parameters  that  maximizes  /(y;  6)  for  a  given  realization  of  y.  The 
premise  is  that  after  observing  y,  the  maximizing  parameters,  6,  are  the  most  likely 
parameters  of  the  signal  and  noise  model.  When  a  PDF,  f{y',0),  is  considered  in 
this  fashion,  with  a  known  y  and  unknown  6,  it  is  called  a  likelihood  function. 

Estimating  parameters  with  the  ML  approach  is  motivated  by  its  relation  to 
the  Cramer-Rao  bound  (CRB).  The  Cramer-Rao  bound  identifies  a  lower  bound 
for  the  variance  of  all  unbiased  estimators  that  is  unique  to  the  signal  and  noise 
model  (35:27).  Therefore,  it  provides  a  benchmark  against  which  all  unbiased  esti¬ 
mation  approaches  can  compete.  An  estimator  derived  from  the  ML  approach  is  not 
guaranteed  to  attain  the  CRB,  but  it  has  been  proven  that  if  an  unbiased  estimator 
can  attain  the  CRB,  the  ML  approach  will  provide  it  (35:186-187).  The  theory  of  the 
CRB  is  presented  at  the  end  of  this  chapter  and  developed  in  Appendix  C.  First,  we 
follow  the  methodology  of  Hogg  and  Craig  (24:80-89),  Kay  (35:500-508),  Ziskind  and 
Wax  (74:1554-1555),  Lanczos  (44:272-280),  and  Bresler  and  Macovski  (4:1082-1084) 
to  develop  an  ML  estimator  for  the  superimposed  weighted  exponential  problem.  All 
the  work  discussed  in  this  chapter  is  review.  It  is  presented  to  provide  background 
and  motivation  for  our  research. 
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2.2  Maximum  Likelihood  Estimator 

Temporarily  let  a  signal  plus  noise  element,  y[m],  take  the  complex  represen¬ 
tation  y[m]  =  y,.  +  jyi.  Also  let  the  signal  model  take  the  complex  representation 
s[m]  =  Sr  +  jsi,  where  yr,  yi,  Sr,  and  Sj  are  real.  Because  the  measurement  noise 
real  and  imaginary  components  are  Gaussian  distributed  with  mean  zero  and  vari¬ 
ance  cr^/2,  and  because  y[m]  is  a  linear  combination  of  the  deterministic  5[m]  and 
stochastic  w{m],  yr  and  y,  are  distributed  as 


(5) 


/ (2/r)  Sr,(r  )  — 


'{yr  ^r) 


and 


(6) 


f{yi]Si,(j'^)  =  ■■■; - -exp 


-{yi  -  Sif 


Because  the  measurement  noise  real  and  imaginary  components  are  uncorrelated  and 
Gaussian,  they  are  stochastically  independent.  Therefore,  the  PDF  f{yr,  yi',  Sr,  Si,  a^) 
is  the  product  of  f{yr',  Sr,  cr^)  and  /(j/*;  s*,  cr^)  so  that 


(7) 


f{yr,  yi]  Sr,  Si,  (7^)  =  ^  CXp - \  ({yr  -  Srf  -f  {yi  -  Sif) 

77(7  L  <7  '  ^ 


Incorporating  the  definitions  of  y[m]  and  s[m].  Equation  7  becomes 

-^IbH  -5MII2 

where  ||  •  II2  represents  the  2-norm. 

Likewise,  because  each  element,  y[m],  is  uncorrelated  and  Gaussian — therefore 
stochastically  independent — the  PDF  of  the  vector  y  is  the  product  of  the  PDFs  for 


(8) 


f{y[m]]s[m],a^)  =  — 5  exp 


TTCr^ 
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each  y{m]  so  that 


M-l 


(9) 


f{y,V,c,a^)  =  Yi  f{y[m]]s[m],a^) 

m=0 

M-l 


“3  E  II2/N  -5NII2 


m=0 


In  equivalent  vector  notation, 


(10) 


f{y]  V,  c,  cr^)  = 


where  ^  is  the  conjugate  (Hermitian)  transpose. 
In  abbreviated  notation, 


(11)  y^CN{Vc,<yH) 

implies  y  is  distributed  complex  Gaussian  (complex  normal)  with  mean  vector  Vc 
and  covariance  matrix  uH.  The  identity  matrix,  I,  is  of  dimension  M  x  M.  Notice 
that  the  common  and  uncorrelated  variance,  <7^,  affects  a  covariance  matrix  with 
constant  diagonal  only  terms. 

Returning  to  the  ML  task  of  finding  the  V,  c,  and  that  maximize  f{y-,  V,  c,  a^) 
for  the  observation  y,  we  find  it  easier  to  first  take  the  natural  logarithm  of  /(y;  V,  c,  a^) 
so  that 


(12) 


L{V,c,a^)  = 


In  f{y;  V,  c,  a^) 

— MlnTT  —  Min  (7^  — ^(y  —  Vc)^{y  —  Vc). 
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The  function  L  is  known  as  the  log-likelihood  function,  and  because  the  natural 
logarithm  is  a  monotonic  function,  L  will  maximize  at  the  same  parameters  as  the 
PDF.  Additionally,  since  Mlnvr  is  not  a  function  of  V,  c,  or  cr^,  it  is  equivalent  to 
maximize 

(13)  L{V,c,a^)  =  -Mlna^  -  ^{y  -  Vc)^{y  -  Vc). 


We  also  know  at  the  maximum. 


(14) 


dL(V,c,a'^) 

da^ 


=  0 


where 


(15) 


dL(V,cy)  M  1 


2 


{y-Vcf{y-Vc). 


Therefore,  at  the  maximum  of  L, 


(16)  =>  a^M  =  {y —  Vc)^{y —  Vc) 

^  =  M(y-l^c)^(y-'Pc). 


If  we  substitute  the  expression  for  cr^  back  into  L,  we  can  now  say  we  need  to 
maximize 


(17) 


L{V,c)  =  -Min  -Vc)^{y -Vc)^  -  M. 
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Again,  because  M  is  a  constant  and  the  natural  logarithm  is  a  monotonic  function, 
maximizing  L  is  equivalent  to  minimizing 


(18) 


L(V,c)  =  (y-Vc)''{y-Vc). 


To  simplify  L  further,  we  can  remove  the  dependence  on  c  by  considering  the 
current  expression  for  L  at  its  minimum,  but  before  doing  so,  we  need  to  define  real 
and  complex  vector  gradients.  Our  definitions  are  consistent  with  Kay  (35:517-527) 
and  Therrien  (70:686-690).  We  define  the  gradient  of  a  scalar  L  with  respect  to  a 
real  parameter  vector  6^  as 

dL 
dOA 

dL 

d8r2 

dL 
dSrN 

At  a  stationary  point. 


(19) 


dL 

dOr 


(20) 


dL 

de^ 


=  0 


where  the  notation  0  defines  a  vector  of  zeros.  If  we  define  the  complex  parameter 
vector 


(21) 


0  —  Or  +  jOi, 
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where  6^.  and  6i  are  both  real  vectors,  we  can  also  define  the  gradient  of  any  scalar 
L  with  respect  to  a  complex  parameter  vector  6  as 


dL  1  fdL 


de  2[der  ^  dOi 


^  _  1 

W*~  2\dL  dOi 


where  *  denotes  the  complex  conjugate. 

Notice  that  the  log-likelihood  function  is  a  real  scalar  function.  These  gradient 
definitions  allow  us  to  calculate  a  stationary  point  for  a  real  scalar  function  with 
complex  parameters.  Alternative  definitions  do  not  allow  this  because  a  real  scalar 
function  with  complex  parameters  is  not  analytic  (35:518).  At  a  stationary  point, 


86^  ddi 


Therefore, 


dL  - 
89  ~ 


are  both  valid  gradient  expression  for  determining  stationary  points. 
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Returning  to  the  ML  estimator  development,  we  know 


(27) 


dL{V,  c)  _  d{y^y  —  c^V^y  —  y^Vc  +  c^V^Vc) 
dc*  dc* 


Under  the  gradient  definition  just  presented, 


(28) 


9{y’'y) 

dc* 

djc^V^y) 

dc* 

d{y^Vc) 

a? 

djc^V^Vc) 

dc* 


0 

0 

V^y 

V^Vc. 


This  implies 


(29) 


dL(V,c) 

dc* 


=  -V^y  +  V^Vc. 


The  last  three  gradients  of  Equation  28  are  derived  in  Appendix  B.  Therefore,  at 
the  minimum  of  L, 


(30) 


V^Vc  =  V^y 
c^{V^V)-^V^y. 


Recall  that  V  is  of  full  rank  N  with  M  >  N.  Therefore,  V^V  is  also  of  full  rank  N, 
and  (y^V)^^  exists. 
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This  expression  for  c  can  be  substituted  into  L  so  that  minimizing  L  is  equiv¬ 
alent  to  minimizing 


L{V)  =  {y-V{V^V)-'^V^yf{y-V{V^V)-^V^y) 

=  ((I  -  V{V^V)-^V^)y) ""  ((I  -  V{V^V)-W^)y) 

=  y^{l  -  V{V^V)-W^)^{1  -  ViV^Vy^V^)^ 

(31)  =  f{i  -  viv^vyvyg. 

Equation  31  is  the  exact  maximum  likelihood  cost  function  for  the  superimposed 
exponential  problem  reduced  to  just  the  N  A„  parameters  in  V.  The  expression 

(32)  (I  -  V{V^Vyvyy 

is  recognized  as  the  projection  of  the  observation  vector  y  onto  the  orthogonal  com¬ 
pliment  of  the  range  space  of  the  columns  of  V.  Under  the  maximum  likelihood 
approach,  we  desire  the  exponential  parameters,  A„,  for  the  columns  of  V  that  mini¬ 
mize  this  projection.  If  the  minimizing  A„  can  be  accurately  estimated,  we  can  then 
estimate  the  amplitude  coefficients,  c,  by  solving 

(33)  c  =  {V^VyV^y. 

With  V  and  c  estimated,  we  can  even  estimate  the  measurement  noise  variance,  cr^, 
with  the  earlier  expression 

(34)  =  ^(^y-Vcy{y-Vc). 
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The  range  space  of  the  columns  of  V  is  referred  to  as  the  signal  subspace,  and 
its  orthogonal  compliment  is  referred  to  as  the  annihilator  subspace.  The  maximum 
likelihood  solution  seeks  to  minimize  the  projection  of  the  observation  vector  onto 
the  annihilator  subspace.  To  accomplish  this  by  finding  the  A„  that  minimize  the 
cost  function  T  is  a  highly  nonlinear  task.  With  some  astute  insight,  the  process 
can  be  simplified  by  building  an  alternative  projection  matrix  specifically  for  the 
annihilator  subspace.  What  follows  is  the  derivation  of  such  a  projection  matrix. 

We  know  from  differential  equation  theory,  the  noiseless  signal  model 

(35)  5[m]  =  Cl  A™  +  C2A™  +  . . .  +  cjv^A)^ 

is  a  solution  to  the  homogeneous,  constant  coefficient,  linear  difference  or  linear 
prediction  (LP)  equation 

(36)  5o5[m]  +  6is[m  —  1]  + . . .  +  bffs[m  —  N]  =  0. 

To  see  this,  substitute  the  solution  into  the  LP  equation  so  that 

5o  [ciA™  +  C2A^  +  . . .  +  cjvA^]  + 

bi  [ciA™  ^  +  C2A^  ^  +  . . .  +  cjvA^  ^  +  . . .  + 

bN  [ciAr-^^  +  C2xr^  + . . .  +  cnXt'']  =  0 

[^qCiA™  +  5iCiA^  ^  +  . . .  +  fejvCiA^”  ^  + 

[5oC2A^  +  61C2A™  ^  +  .  .  .  +  6jvC2A^  ^  +  . . .  + 

[^qCjvAjv  +  biCjfX^  ^  +  . . .  +  bifCj^X^  ^  =  0 

=>  ciAr'^[Mf +Mr^+...+5iv]  + 

C2A™  ^  boX^  +  biX2  ^  +  . . .  +  6iv’]  +  . . .  + 
cnX^  ^  boX^  +  biX^  ^  +  . . .  +  feiv"]  =  0. 
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Because  c„  and  A„  are  non  zero,  the  LP  coefficients,  6o  •  •  •  must  be  such  that 


(38) 


^iV-l  •  •  •  bo 


X^ 


=  0 


for  all  n.  Also,  the  roots  z  of  the  characteristic  polynomial  formed  from  the  LP 
coefficients 


(39)  boz^  +  biz^  ^  +  . . .  +  b]^ 

are  equal  to  the  exponentials,  A„. 

If  we  propose  the  (M  —  N)  x  M  Toeplitz  matrix 


bN  bff-i 

bo 

0  •• 

•  0 

(40) 

B  = 

0  b^ 

bN-i 

.  .  . 

bo 

.  0 

0  ••• 

0 

bff 

bN-i  • 

•  bo 

we  can  see 

(41) 

BV 

=  0. 

Thus,  every  row  of  B  is  orthogonal  to  every  column  of  the  Vandermonde  matrix  V 
described  earlier.  Also,  because  each  row  of  B  is  linearly  independent,  the  rank  of 
B  is  M  —  N.  Therefore,  the  rows  of  B  form  a  basis  for  the  orthogonal  compliment 
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of  the  range  space  of  the  columns  of  V,  and  through  the  projection  matrix,  we  know 


(42)  =  l-V{V^Vy'^V^. 

When  this  information  is  brought  back  to  the  maximum  likelihood  derivation, 
minimizing  L  is  equivalent  to  minimizing  the  alternative  maximum  likelihood  cost 
function 

L(b)  =  y^B^{BB^)-'^By 


(43) 

=  b^Y^{BB^)-^Yb 

where 

Y  = 

Vo  yi 

Vn 

Vn  = 

y[N  -  n] 

y[N  +  1  —  n]  •  •  •  y[M  —  1 

r 

-iT 

(44) 

b  = 

O 

•  brf 

The  data  matrix  Y  is  (M  —  N)  x  (N  +  1)  with  Toeplitz  structure.  If  we  let 

(45)  R{b)  =  BB^, 
we  can  rewrite  L  as 

(46)  L{h)  =  WY^R{h)-^Yh. 
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Minimizing  the  alternative  L  is  still  a  nonlinear  task,  but  if  we  use  a  previous 
estimate  of  6,  labeled  6j_i,  to  create  a  fixed  we  attain  the  quadratic  expression 

in  hi 


(47)  hfY^R{hi_,)-^Yhi 

where  hi  identifies  the  current  unknown.  Techniques  for  finding  the  hi  that  mini¬ 
mizes  a  quadratic  are  well  known  and  are  examined  further  in  Chapters  III  and  IV. 
An  iterative  approach  that  utilizes  this  quadratic  expression  naturally  follows  and 
is  known  as  the  iterative  quadratic  maximum  likelihood  (IQML)  algorithm.  The 
algorithm  is  most  concisely  presented  in  a  paper  by  Bresler  and  Macovski  (4),  but 
its  origins  and  alternative  deliveries  can  be  found  in  papers  by  Evans  and  Fischl  (15) 
and  Kumaresan,  Scharf,  and  Shaw  (39). 

It  is  important  to  emphasize  the  IQML  algorithm  does  not  directly  minimize 
the  maximum  likelihood  cost  function  but  rather  a  quadratic  approximation  of  it. 
Nevertheless,  the  IQML  algorithm  is  the  most  popular  and  accurate  of  all  parameter 
estimation  schemes  for  the  superimposed  exponential  problem. 

2.3  Cramer- Rao  Bound 

Confidence  in  all  ML  based  estimators  is  gained  when  the  relationship  between 
maximum  likelihood  and  the  Cramer-Rao  bound  is  understood.  In  this  section,  key 
properties  of  the  theory  of  the  CRB  are  presented  to  show  the  relationship.  In 
Appendix  C  the  theory  necessary  to  justify  the  properties  is  developed.  For  both 
this  section  and  the  appendix,  the  books  of  Scharf  (61:221-227),  Kay  (35:30-45), 
and  Papoulis  (54:260-265)  were  the  primary  references. 
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Ideally,  we  desire  a  function  of  the  data,  g{y),  that  provides  the  best  estimate  of 
the  parameters  9  where,  in  our  case,  0  is  a  vector  representation  of  all  the  parameters 
in  V,  c  and  Recall  the  log-likelihood  function,  initially  defined  in  Equation  12, 
where 


(48)  L{S)=\nmS). 

In  deriving  the  alternative  majdmum  likelihood  cost  function 

(49)  L{b)  =  WY^R{b)-^Yb, 


we  have  shown  that  finding  the  b  that  minimizes  L{b)  is  equivalent  to  finding  the 
9  that  maximizes  L{9).  Therefore,  the  IQML  algorithm  serves  as  a  function  of  the 
data,  g{y),  that  estimates  9  for  maximizing  L{9). 

In  developing  the  IQML  algorithm,  and  all  other  ML  based  estimators,  we 
exploit  the  requirement  that  at  the  maximum, 

r'iOi  ^  dlnfiy,  9)  ^ 

89  89  ' 


The  gradient  of  the  log-likelihood  function 


(51) 


8lnf{y-,9) 

89 


is  a  key  concept  for  the  theory  of  the  CRB  and  links  maximum  likelihood  with  the 
CRB. 

Before  identifying  the  significant  properties  of  the  CRB,  we  need  to  review  and 
develop  a  few  more  concepts.  If  g{y)  is  an  unbiased  estimator  of  9,  then  the  mean 
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vector  or  expected  value  of  g{y)  is  equal  to  6.  Symbolically, 

/OO  -  - 

9{y)f{y\G)dy  =  6. 

-oo 

The  covariance  matrix  of  g{y)  is  defined  as 

(53)  C  =  E  {{g{y)  -  E{g{y)}){g(y)  -  E{g{y)}f}  , 

but  for  the  unbiased  estimator 

(64)  C  =  E  {{g{y)  -  S){g{y)  -  Sf}  . 

The  expression 

(55)  g{y)  -  0 


is  known  as  the  estimator  error  vector. 

Returning  to  the  gradient  of  the  log-likelihood  function,  L,  we  find  that  if  a 
“regularity”  condition  is  satisfied. 


(56) 


gln/(j/;^) 

de 


has  statistical  properties  of  its  own.  The  “regularity”  condition  assumes 


(57) 


)  too  _  /•O 

I/  f{y-,d)dy=  f 

U  J  —  oo  J — c 


d9  j—oo 


de  ^ 


This  is  generally  true  whenever  the  domain  of  the  non-zero  portion  of  the  PDF  is  not 
a  function  of  6  and  is  definitely  true  in  our  application.  The  need  for  this  condition 


20 


is  critical  to  the  derivation  of  the  CRB  in  Appendix  C  and  is  recognized  here  to 
satisfy  the  last  concept  required  for  identifying  the  following  significant  properties 
of  the  theory  of  the  CRB: 

Property  One.  If  a  PDF,  /(y;  d),  satisfies  the  “regularity”  condition,  then  the  error 
covariance  matrix,  C,  of  any  unbiased  estimator,  ff(y),  must  satisfy  the  condition  that 
C  —  F(0)~^  is  positive  semi-definite.  The  matrix,  F(0),  is  the  Fisher  information 
matrix  and  is  defined  as  the  covariance  matrix  of  the  gradient  of  the  log-likelihood 
function 


(58) 


For  the  difference  of  matrices,  C  —  F(0}~^,  to  be  positive  semi-definite,  each  element 
of  the  diagonal  of  C  must  be  greater  than  or  equal  to  each  element  of  the  diagonal 
of  F(0)~^.  This  implies 


(59)  =  B  {(9(5).  -  «.)(9(S)„  -  «.)"}  >  F{0):1 


In  words,  the  nnih.  element  of  the  inverse  of  the  Fisher  information  matrix 
is  a  lower  bound  on  the  variance  of  the  nth  parameter  estimated  by  the  unbiased 
estimator  g{y).  Calculating  the  Fisher  information  matrix  for  a  known  PDF  to 
test  the  performance  of  an  unbiased  estimator  is  straight  forward.  For  the  complex 
superimposed  exponential  problem,  closed  form  expressions  for  the  calculations  are 
presented  by  Steedly  and  Moses  in  (67). 
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Property  Two.  An  unbiased  estimator,  g'(y),  may  be  found  that  attains  the  CRB, 
in  that  C  =  F{6)~^,  if  and  only  if 

(60)  = 

The  resulting  estimator,  g{y),  is  the  minimum  variance  unbiased  (MVU)  estimator 
of  e. 

With  regards  to  the  second  property,  the  ability  to  manipulate  the  log-likelihood 
function  into  the  form 

(61)  F(e)  {g(y)  -  0) 

may  be  difficult.  This  is  definitely  the  case  for  our  superimposed  exponential  prob¬ 
lem,  but  we  can  demonstrate  the  capability  by  considering  a  simplified  version  of  the 
same  problem.  Assume  the  exponentials,  A„  in  V,  and  the  noise  variance,  of  the 
superimposed  exponential  PDF,  f{y',V,c,cr^),  are  known.  Therefore,  the  amplitude 
coefficients,  c,  are  the  unknown  parameters  to  be  estimated.  Prom  the  development 
in  the  last  section,  the  log-likelihood  function  of  the  superimposed  exponential  PDF 
is 

(62)  L(c)  =  — MlnTT  —  Mlncr^  — —  Vc)^{y  —  Vc). 
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Since  L  in  this  case  is  only  a  function  of  c,  we  can  maximize  L  by  solving  for  c  at 
the  stationary  point  ||  =  0  or  ^  =  0.  From  Appendix  B  we  know 

=  ^  (''"s  - 

(63)  =  ^  {(V’V)-'V’’y  -  e)  . 

From  the  second  property  of  the  theory  of  the  CRB, 

(64)  F-\c)  =  C^ 
and 

(65)  g{y)  =  {V’'vr'V’‘y. 

Recall  from  the  development  of  the  ML  estimator  in  the  previous  section,  after 
accurately  estimating  A„,  the  amplitude  coefficients,  c,  are  estimated  by  solving  the 
same  expression 

(66)  c  =  {V^Vy^V^y  =  g{y). 

Furthermore,  knowing  y  is  distributed 

(67)  y  ~  CN{Vc,  aH), 
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we  can  use  the  linear  transformation  or  pseudo-inverse,  (V^V)  from  Equa¬ 
tion  66  to  determine  the  distribution  of  the  estimator  g{y)  (61:59).  The  result  is 

g{y)  ~  Af  V^Vc,  (V^Vy^  V^aHV  {v^v) 

^  9{y)-^^f  {c,cTyv^vy^y 

Indeed,  g[y)  is  an  unbiased  estimator  with  mean  vector  c  and  covariance  matrix 
a‘^{V^V)~^  as  identified  from  the  second  property  of  the  theory  of  the  CRB. 

This  exercise  highlights  the  relationship  between  maximum  likelihood  estima¬ 
tors  and  the  CRB.  Although  we  did  not  use  the  second  property  of  the  CRB  for 
developing  the  ML  estimator  of  the  completely  unknown  superimposed  exponential 
problem,  we  do  use  the  first  property  to  calculate  the  CRB  for  testing  all  the  unbi¬ 
ased  estimators  discussed  in  this  dissertation.  Additionally,  the  second  property  is 
implicitly  referenced  anytime  we  claim  an  estimator  to  be  MVU. 

2.4  Chapter  Conclusion 

In  this  review  chapter,  we  developed  the  maximum  likelihood  cost  function  for 
the  superimposed  exponential  parameter  estimation  problem.  We  showed  equivalent 
relationships  for  representing  the  ML  cost  function  in  terms  of  all  the  parameters, 
V,  c,  and  cr^;  just  the  exponentials,  V;  and  even  the  linear  prediction  coefficients, 
b.  With  a  previous  estimate  of  the  LP  coefficients,  we  identified  how  the  ML  cost 
function  can  be  approximated  by  a  quadratic  for  a  new  b  estimate. 

The  impetus  for  developing  a  maximum  likelihood  estimator  comes  form  its 
relationship  to  the  Cramer-Rao  bound.  We  showed  this  relationship  and  identified 
two  significant  properties  of  the  theory  of  the  CRB.  At  this  point,  we  can  conclude 
that  the  ML  approach  is  a  logical  methodology  for  finding  the  estimator  that  is,  or 
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gets  as  close  as  possible  to  being,  the  MVU  parameter  estimator  for  the  superimposed 
exponential  problem.  In  the  next  chapter,  we  present  an  alternative  methodology, 
solely  based  on  linear  prediction,  for  comparison. 
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III.  Linear  Prediction 


3.1  Chapter  Introduction 

Linear  prediction  (LP)  based  parameter  estimators  for  the  superimposed  ex¬ 
ponential  problem  date  back  to  1795  and  were  originally  proposed  by  le  Baron  de 
Prony  (9).  Although  initially  inadequate,  they  have  been  continually  improved  upon. 
In  this  chapter,  we  begin  with  the  original  Prony  estimator  and  highlight  the  sig¬ 
nificant  improvements  made  to  LP  estimators  over  their  history.  We  culminate  the 
chapter  by  deriving  our  own  linear  prediction  based  estimator  that  possesses  a  cost 
function  identical  to  that  of  the  maximum  likelihood  (ML)  cost  function  developed 
in  Chapter  11.  With  an  identical  cost  function,  the  LP  estimator  acquires  all  the 
desirable  attributes  associated  with  ML  and  the  theory  of  the  Cramer-Rao  bound 
(CRB).  Establishing  an  equivalence  between  the  linear  prediction  based  methodology 
and  the  maximum  likelihood  based  methodology  is  an  original  contribution  of  this 
dissertation.  Giving  linear  prediction  the  capability  to  possibly  obtain  the  minimum 
variance  unbiased  (MVU)  estimator  provides  an  alternative  to  the  ML  methodology 
and  allows  us  to  retain  the  critical  insight  gained  during  linear  prediction’s  long  and 
arduous  development.  In  Chapter  VI,  this  insight  leads  to  a  significantly  improved 
estimator  for  deep  level  transient  spectroscopy  (DLTS)  applications.  The  same  esti¬ 
mator  would  be  extremely  difficult  to  directly  develop  from  maximum  likelihood. 

3.2  Least  Squares 

Consider  the  noiseless  superimposed  exponential  signal 
(69)  s[m]  =  ciA™ -f  C2A™ -f  . . . -f 
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From  the  equation,  it  is  apparent  that  each  element,  s[m],  is  a  linear  weighted 
combination  of  the  A„  exponentials.  If,  once  again,  we  assume  the  exponentials 
are  known  and  the  weights  or  amplitude  coefficients,  c„,  are  the  parameters  to  be 
estimated,  we  are  inclined  to  solve  for  ci . . .  cjv  with  a  system  of  N  linear  equations 
and  N  unknowns.  Only  N  realizations  of  s[m]  are  necessary  to  construct  an  exactly 
determined  system  of  linear  equations,  implying  that,  without  noise,  the  solutions 
for  Cl . . .  cjv  are  exact. 

When  the  exponentials  are  also  unknown,  the  linear  relationship  in  Equation  69 
is  no  longer  apparent.  Prony  has  been  credited  with  realizing  that  the  summed 
exponential  signal  of  Equation  69  is  the  solution  to  a  linear  difference  or  linear 
prediction  equation  (61:489) 

(70)  +  bis[m  —  1]  +  . . .  +  bi^s[m  —  N]  =  0. 

This  fact  was  demonstrated  in  Chapter  II.  Recall  that  the  exponentials,  A„,  are  the 
roots  of  the  polynomial  formed  by  the  linear  prediction  coefficients,  bo. .  .bjf.  The 
LP  polynomial  takes  the  form 

(71)  boz^  +  biz^  ^  +  . . .  +  bif. 

Notice  that  Equation  70  is  a  linear  weighted  combination  of  consecutive  re¬ 
alizations  of  s[m].  Once  again,  we  are  inclined  to  construct  an  exactly  determined 
system  of  linear  equations  to  solve  for  the  LP  coefficients  bo ..  .bi^.  If  we  assume 
bo  normalizes  to  1,  the  soluble  roots  of  the  LP  polynomial  are  unaffected,  and  only 
2N  realizations  of  s[m]  are  necessary  to  develop  N  equation  for  the  N  unknowns, 
fei . . .  fejv 
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After  solving  bi. .  .b^  and  rooting  the  exponentials,  A„,  from  the  LP  polyno¬ 
mial,  the  amplitude  coefficients,  Cn,  can  be  exactly  determined  from  the  previous 
system  of  linear  equations  based  on  the  noiseless  signal  model  Equation  69.  This 
methodology  is  known  as  the  original  Prony  method.  It  conveys  the  fundamental 
concepts  of  linear  prediction  in  the  context  of  parameter  estimation  for  the  super¬ 
imposed  exponential  problem.  The  elegance  of  this  estimation  technique  is  easily 
overlooked.  It  embodies  the  classic  approach  of  decomposing  a  nonlinear  problem 
into  separate  problems  which  are  solved  using  linear  methods,  and  as  long  as  the 
data  is  completely  modeled  and  noiseless,  an  exact  solution  is  obtained. 

To  emphasize  further,  we  will  apply  the  original  Prony  method  to  four  data 
points  from  a  noiseless  two  mode  signal.  Step  One  of  the  original  Prony  method — and 
all  other  LP  based  methodologies  for  estimating  the  parameters  of  a  superimposed 
exponential  signal — requires  the  construction  of  the  system  of  linear  equations 

5 [2]  -f  6i5[1]  -p  625 [0]  =  0 

(72)  5(3]  4-  6is[2]  -|-  62'S[1]  =  0 


or 


(73) 

>[1] 

.[0] 

bi 

_ 

-.[2] 

s[2l 

s[l] 

&2 

-.[3] 

for  solving  for  the  LP  coefficients  61  and  62.  After  solving  for  61  and  62  we  root  Ai 
and  A2  from  the  LP  polynomial 

(74)  -I-  biz  -I-  62- 
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Step  Two  requires  the  construction  of  the  system  of  linear  equations 

s[0]  =  C1A5  +  C2A2 
(75)  s[l]  =  ciAj;  +  C2A2 

or 


(76) 

1  1 

Cl 

s[0] 

Ai  A2 

C2 

s[l] 

for  solving  for  the  amplitude  coefficients  ci  and  C2. 

When  noise  is  added  to  the  signal,  neither  the  system  of  linear  equations  for 
the  LP  coefficients  nor  the  system  of  linear  equations  for  the  amplitude  coefficients 
yield  exact  solutions.  Because  the  LP  coefficient  estimator  is  developed  solely  from  a 
noiseless  LP  equation,  its  estimates  are  easily  corrupted  by  noise  (44:275).  To  make 
matters  worse,  in  most  LP  methodologies,  the  amplitude  coefficient  estimates  from 
Step  Two  are  based  on  exponential  estimates  from  the  roots  of  the  LP  polynomial 
formed  from  LP  coefficient  estimates  in  Step  One.  Inaccuracies  in  the  LP  coefficient 
estimates  from  Step  One  are  compounded  throughout  the  algorithm. 

To  further  analyze  the  effects  of  noise  and  introduce  improvements  made  to 
linear  prediction,  consider  all  the  data  points  of  the  noisy  signal  in  the  vector  notation 

(77)  y  =  Vc-^w 

or 

(78)  y  —  Vc  =  w. 
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Even  though  we  have  an  assumed  distribution  for  w,  the  assumption  is  unnecessary 
for  the  development  that  follows.  The  vector  w  is  simply  the  error  vector  between 
the  observed  noisy  signal,  y,  and  the  assumed  model,  Vc.  One  possible  approach  to 
estimating  the  parameters  of  the  assumed  model  is  to  search  for  the  A„  and  c„  that 
minimizes  the  sum  of  the  squared  error 

J{V,c)  =  w^w 

(79)  =  {y-Vcf{y-Vc). 

This  approach  is  known  as  least  squares  (LS),  and  in  this  case,  the  least  squares 
cost  function,  J{V,c),  is  identical  to  the  maximum  likelihood  cost  function,  L{V,c) 
developed  in  Chapter  II. 

With  unknown  V  and  c,  minimizing  J{V,  c)  is  a  non-linear  least  squares  prob¬ 
lem,  but  for  further  insight,  once  again  assume  the  exponentials  are  known.  Min¬ 
imizing  J(c)  is  a  linear  least  squares  problem  and  in  Chapter  II  we  developed  the 
minimum  solution  to  be  the  pseudo-inverse  estimator 

(80)  c  =  {V^Vy^V^y. 

We  can  prove  the  pseudo-inverse  estimator  obtains  a  unique  minimal  solution  by 
completion  of  squares.  Let  Cq  =  {V^Vy^V^y  and  expand  J(c)  so  that 

■7(c)  =  (^c  -  V"(c  -f  Co  -  co))^(yc  -V{c  +  co-  cq)) 

(81)  =  {y-c  -  Vcofiy,  -  Vco)  +  (c  -  c^yV^V{c  -  cq). 
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The  left  term  of  Equation  81  is  not  a  function  of  c  and  is  therefore  fixed  by  the  data. 
The  right  term  of  Equation  81  is  of  quadratic  form  in  c.  In  Appendix  A  we  show  V 
has  full  column  rank.  Because  V  has  more  rows,  M,  than  columns,  N ,  we  also  know 
V^V  is  positive  definite  (33:26).  This  implies 

(82)  (c-co)'^y^F(c-co)  >0 

for  all  c  ^  cq.  Therefore,  the  unique  minimum  J(c)  occurs  when  c  =  cq  =  (y^V)~^V^y. 
Also,  recall  that  when  the  uncorrelated  zero  mean  Gaussian  distributional  assump¬ 
tion  about  w  is  made, 

(83)  c  =  {V^Vy^V^y 

is  the  MVU  estimator  for  the  simplified  superimposed  exponential  problem. 

Therefore,  when  considering  the  completely  unknown  parameter  estimation 
problem,  if  we  can  accurately  estimate  the  exponentials  in  Step  One  of  the  LP 
methodology,  we  can  utilize  the  pseudo-inverse  estimator  to  attain  the  best  possible 
estimate  of  the  amplitude  coefficients,  even  in  noise.  Thus,  linear  prediction  research 
for  the  superimposed  exponential  parameter  estimation  problem  is  focused  on  Step 
One  of  the  LP  methodology:  estimating  the  linear  prediction  coefficients. 

As  mentioned  before,  the  LP  equation 

(84)  bQs[Tn]  +  bis[m  —  1]  -I- ...  -f  &jvs[m  —  A^]  =  0 
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is  only  exact  under  the  noiseless  signal  assumption.  If  we  substitute  the  noisy  real¬ 
izations,  y[m],  for  the  noiseless  realizations,  s[m],  we  can  say 

(85)  boylm]  +  biy[m  -  1]  -1- 5ivy[m  -  A^]  =  e[m] 

where  e[m]  is  the  unknown  error  invoked  by  the  substitution.  Using  all  the  available 
data,  we  can  construct  the  overdetermined  system  of  linear  equations 

(86)  Yb  =  e 

where 

^  =  yo  yi  vn 

■iT 

yn  =  y[N  —  n]  y[N  -I- 1  —  n]  •  •  •  y[M  —  1  —  n] 

-  r 

b  =  bo  bi  ••  •  bpf 

-,T 

(87)  e  =  e[A^]  e[iV -f- 1]  •••  e[M  - 1]  • 

The  data  matrix,  Y,  has  dimensions  (M  —  N)  x  {N  +  1)  and  Toeplitz  structure. 
With  the  previous  least  squares  development,  it  is  logical  to  assume  we  can  improve 
our  LP  coefficient  estimates  by  minimizing  the  sum  of  the  squared  error 

J{b)  =  e^e 

=  {Yb)^{Yb) 

(88)  =  b^Y^Yb. 
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Before  continuing  further,  note  that  unlike  J(c),  j(b)  is  not  identical  to  its 
counterpart,  L{b),  developed  with  the  maximum  likelihood  methodology  in  Chap¬ 
ter  11.  Under  maximum  likelihood. 


(89) 


L{b)  =  b^Y^{BB^)-'^Yb. 


Therefore,  the  b  that  minimizes  J{b)  is  not  a  maximum  likelihood  estimate. 

In  the  iterative  quadratic  maximum  likelihood  (IQML)  algorithm,  we  addressed 
this  by  using  a  previous  estimate  of  b  to  calculate  and  fix  {BB^)~^  for  a  new  estimate 
of  b.  If  an  initial  estimate  of 


(90) 


6  = 


1  0  ••• 


0 


T 


is  used  for  the  IQML  algorithm,  the  resulting  matrix 


(91) 


=  I 


shows  that  the  initial  estimate  is  equivalent  to  starting  the  IQML  algorithm  with 
a  least  squares  estimate.  This  is  a  popular  technique  for  initializing  the  IQML 
algorithm  and  is  discussed  in  detail  in  Chapter  V. 

Historically,  the  least  squares  improvement  was  introduced  to  linear  prediction 
by  Hildebrand  in  the  1950s  (22:457-462).  At  the  time,  the  relationship  between  the 
least  squares  cost  function  and  the  maximum  likelihood  cost  function  was  unknown. 
Dubbed  the  extended  Prony  method,  the  least  squares  modification  offers  consider¬ 
able  improvement  over  the  original  Prony  method  (33:225).  The  covariance  method 
presented  by  Makhoul  in  1975  is  identical  to  the  extended  Prony  method  but  is  de- 
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rived  under  entirely  different  assumptions  (50:564).  In  this  dissertation,  we  find  it 
informative  to  reference  the  extended  Prony  method  as  the  least  squares  algorithm. 

Before  demonstrating  the  improved  performance  gained  by  the  least  squares 
algorithm  over  the  original  Prony  method,  we  will  further  define  the  least  squares 
LP  coefficient  cost  function  J{b).  Let 


(92) 


Y  = 


yi  y2  yN 


and 


(93) 


SO  that 


(94) 


J{b)  =  (Yb)‘‘(Yl) 

=  (yo  +  Ybfi^a  +  Yb) 

=  J(i)- 


Again  we  have  assumed  bo  normalizes  to  1.  Also,  Y  is  an  (M  —  N)  x  N  data  matrix 
with  Toeplitz  structure.  In  Appendix  A  we  describe  how  Y  is  assumed  to  have 
full  column  rank  N.  Therefore,  we  assume  Y^Y  is  invertible  and  like  J(c),  we  can 
minimize  j(b)  with  the  unique  pseudo-inverse  solution 

(95)  b  =  -{Y^Y)-^Y^yo. 


In  review,  the  least  squares  algorithm,  for  estimating  the  parameters  of  super¬ 
imposed  exponential  signals,  follows  the  same  LP  methodology  as  the  original  Prony 
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method.  However,  in  both  the  LP  coefficient  estimation  portion  and  the  amplitude 
coefficient  estimation  portion,  the  exactly  determined  solutions  of  the  original  Prony 
method  are  replaced  with  overdetermined,  least  squares,  solutions. 

To  show  the  superior  performance  of  the  LS  algorithm  over  the  Prony  method, 
simulated  noisy  signals  were  analyzed  in  a  Monte-Carlo  experiment.  A  complex,  25 
data  point,  two  mode,  noiseless  signal  was  created  with  parameters  ci  =  C2  =  1, 
Ai  =  and  A2  =  Two  randomly  generated  25  element  vectors  were 

then  added  to  the  real  and  imaginary  elements  of  the  noiseless  signal,  respectively. 
The  randomly  generated  elements  of  each  vector  were  Gaussian  distributed  with 
mean  zero  and  variance  (T^/2.  The  variance  was  governed  by  SNR,  in  dB,  under  the 
formula 


(96) 


cr2  = 


|ciP 

10^ 


so  that 


(97) 


SNR  =  lOlogio  (J^)  . 


Two  hundred  realizations  of  the  noisy  signal  were  created  for  each  SNR.  The 
parameters  of  each  realization  were  estimated  with  both  the  LS  algorithm  and  the 
original  Prony  method.  For  each  parameter,  the  error  between  each  estimate  and 
the  actual  parameter  was  calculated  and  recorded  in  both  magnitude  and  phase. 
After  two  hundred  realizations,  the  magnitude  and  phase  bias  was  calculated  for 
each  parameter  by  summing  the  error  and  dividing  by  the  number  of  realizations. 
Also,  the  mean  square  error  (MSE)  was  calculated  by  summing  the  square  of  each 


35 


120 


-20' - ' - 1 - ' - ' - ' - ' - ' - ' - ' - 1 

0  10  20  30  40  50  60  70  80  90  100 

SNRdB 

Figure  1.  The  Ai  phase  inverse  MSE  of  the  LS  algorithm  and  the  original  Prony 
method  over  a  range  of  SNRs.  Also  plotted  is  the  CRB. 

error  and  dividing  by  the  number  of  realizations.  This  process  was  repeated  for  every 
SNR  ranging  form  0  dB  to  100  dB  in  2  dB  intervals. 

The  results  of  the  experiment  are  displayed  in  plots  of  magnitude  and  phase 
MSE  versus  SNR  and  magnitude  and  phase  bias  versus  SNR  for  each  parameter 
estimated.  The  MSE  plots  include  the  CRB  for  reference  and  the  ordinate  of  the 
MSE  plots  is  scaled  to  101ogio(l/MSE).  The  lOlog^Q  scaling  is  justified  because 
the  CRB  obeys  a  log-linear  relationship  with  respect  to  SNR.  In  the  plot,  the  linear 
bound  assists  with  visualization.  The  inverse  of  MSE  is  justified  by  the  convention 
of  plotting  superior  performance  above  inferior  performance. 

The  performance  of  both  estimators  can  be  assessed  in  Figures  1,  2,  3,  and  4. 
The  first  two  figures  pertain  to  the  MSE  and  bias  of  phase  errors  in  the  Ai  parameter 
estimates.  As  described  earlier,  we  consider  A„  estimates  to  be  the  primary  indicators 
of  estimator  performance  because  of  their  influence  on  the  remainder  of  most  LP 
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Figure  2.  The  Ai  phase  bias  of  the  LS  algorithm  and  the  original  Prony  method 
over  a  range  of  SNRs. 


Figure  3.  The  A2  phase  inverse  MSE  of  the  LS  algorithm  and  the  original  Prony 
method  over  a  range  of  SNRs.  Also  plotted  is  the  CRB. 
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Figure  4.  The  A2  phase  bias  of  the  LS  algorithm  and  the  original  Prony  method 
over  a  range  of  SNRs. 

algorithms.  For  this  complex  experiment,  performance  assessment  in  the  phase  of 
A,i  is  appropriate  because  the  phase  of  A„  is  the  only  discriminator  between  both 
modes  of  the  underlying  signal.  In  the  Ai  phase  inverse  MSE  plot,  Figure  1  we  see 
that  above  30  dB  the  LS  algorithm  estimates  are  consistently  closer  to  the  CRB 
than  the  original  Prony  method  estimates.  From  the  Ai  phase  bias  plot.  Figure  2, 
we  see  that  the  bias  becomes  erratic  below  70  dB  for  the  original  Prony  method  and 
below  30  dB  for  the  LS  algorithm.  Above  those  SNRs,  the  estimators  appear  to  be 
unbiased.  Consequently,  their  comparison  with  the  CRB  in  the  inverse  MSE  plot 
is  relevant.  Below  those  SNRs,  the  erratic  bias  makes  comparison  with  the  CRB 
inappropriate.  The  rapid  increase  in  MSE  (portrayed  as  a  decrease  in  the  inverse 
MSE  plot)  of  the  LS  algorithm  estimator  at  30  dB  is  called  the  noise  threshold.  The 
noise  threshold  is  prevalent  in  many  of  the  algorithms  considered  in  the  dissertation. 
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Because  the  associated  erratic  bias,  few  conclusions  can  be  drawn  below  the  noise 
threshold  from  the  inverse  MSE  plots. 

Figures  3  and  4  are  the  phase  inverse  MSE  and  bias  plots  for  estimates  on 
the  A2  parameter.  The  performance  of  both  estimators  on  the  A2  parameter  is  very 
similar  to  the  performance  on  the  Ai  parameter.  The  ultimate  direction  of  the  bias 
is  the  only  real  difference  and  because  it  occurs  below  the  noise  threshold  we  give  it 
no  attention. 

The  significant  contribution  from  the  experiment  is  verification  of  the  least 
squares  improvement  on  the  linear  prediction  methodology.  In  the  next  section,  we 
consider  another  modification  to  linear  prediction  that  realized  improved  estimator 
performance  but  ultimately  stifled  further  improvement  until  our  introduction  of  a 
more  appropriate  alternative. 

3.3  Overmodeled  Least  Squares 

Over  history,  numerous  researchers  have  realized  that  applying  LP  methodolo¬ 
gies  with  model  orders  in  excess  of  the  assumed  underlying  model  order  normally 
leads  to  improved  parameter  estimates,  e.g.,  (47,  72,  71,  41,  42,  57).  Bresler  and  Ma- 
covski  categorized  these  approaches  as  “heuristic  modifications  of  algorithms  that 
yield  exact  results  when  there  is  no  noise  or  the  amount  of  available  data  is  infi¬ 
nite  (4:1081).”  They  support  this  statement  by  recalling  that  the  cornerstone  of  all 
LP  methodologies,  the  LP  equation 

(98)  bos[Tn]  +  bis[m  —  1]  4- ...  -f  bi^s[m  —  N]  =  0, 

is  only  valid  in  the  noiseless  case.  Nevertheless,  the  practice  of  overmodeling  was 
and  still  is  accepted  by  many.  Methodologies  that  incorporate  overmodeling  avoid 
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iterations,  and  when  the  SNR  is  sufficient,  obtain  estimates  that  are  comparable  to 
maximum  likelihood  (38:1208-1209). 

The  premise  of  overmodeling  is  that  the  extraneous  modes  model  the  noise 
component  of  the  observed  data  while  the  N  actual  modes  appropriately  model  the 
signal  component  of  the  observed  data.  The  artificial  model  order,  P,  has  been 
dubbed  the  prediction  order.  Modifying  the  LS  algorithm  to  incorporate  overmod¬ 
eling  is  not  difficult.  We  refer  to  the  new  algorithm  as  overmodeled  least  square 
(OLS).  Instead  of  TV  -f-  1  coefficients,  the  LP  equation  in  OLS  has  as  P  -f  1  coeffi¬ 
cients.  Therefore,  the  LP  polynomial  roots  to  P  exponentials,  A„.  At  this  point  in 
the  algorithm,  a  requirement  exists  to  sort  the  TV  actual  A„  from  the  P  —  TV  extrane¬ 
ous  A„  estimates.  After  sorting,  the  algorithm  resumes  the  previous  LS  methodology 
for  estimating  the  amplitude  coefficients. 

Numerous  sorting  schemes  have  been  suggested.  Kumaresan  “qualitatively 
argued”  that  all  extraneous  A„  are  distributed  uniformly  around  the  inside  of  the  unit 
circle  (42:218-220).  Empirically,  this  appears  to  be  the  case  for  a  good  range  of  SNRs, 
but  exceptions  occur  as  SNR  decreases.  Kumaresan  used  this  argument  as  a  sorting 
criterion  for  parameter  estimation  problems  where  the  exponentials  are  constrained 
to  the  unit  circle  (sinusoids).  In  his  algorithm,  the  P  —  N  estimates  farthest 
inside  the  unit  circle  are  deemed  extraneous.  Kumaresan  and  Tufts  also  applied  the 
criterion  to  complex  signals  with  decaying  exponentials  (41).  They  realized  that,  in 
noiseless  conditions,  if  the  data  matrix  Y  is  created  with  the  data  in  reverse  order, 
the  resulting  LP  coefficients  root  to  estimates  that  are  the  inverse  of  the  TV  actual 
A„  values.  Therefore,  they  lie  outside  the  unit  circle.  However,  despite  the  reverse 
order  of  Y,  the  P  —  TV  extraneous  A„  estimates  still  typically  lie  inside  the  unit  circle. 
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Consequently,  Kumaresan  and  Tufts  developed  a  sorting  scheme  that  capitalized  on 
this  behavior. 

We  have  found  an  alternative  criterion,  known  as  the  energy  sort,  to  be  reliable 
and  in  our  opinion,  more  tractable.  Although  heuristic  as  well,  the  energy  sort 
has  been  successfully  applied  in  radar  applications  (60).  It  is  predicated  on  the 
calculation  of  the  amplitude  coefficients  with  the  unique  pseudo-inverse  solution 

(99)  c  =  {V'^VyW^y 

derived  earlier.  The  c  estimator  will  also  utilize  the  signal  vector,  y,  and  an  overmod¬ 
eled  matrix  Vover  to  appropriately  represent  the  overmodeled  amplitude  coefficients 
in  an  overmodeled  vector  Cover-  When  an  exponential,  A„,  is  extraneous  to  the  ob¬ 
served  signal,  y,  its  corresponding  amplitude  coefficient,  c„,  is  typically  small.  If  we 
estimate  the  overmodeled  c„  and  A„  with  the  LS  algorithm,  the  extraneous  mode 
estimates  can  be  sorted  and  separated  by  mode  energy.  The  energy  of  a  single  mode 
is  defined  as 


M-l 


(100) 


m=0 

Ic  |2 

,  |A„| 

7^1 

|c„pM, 

|A„i 

=  1 

Under  the  energy  sort  criterion,  the  N  modes  with  the  highest  energy,  En,  are 
considered  to  be  the  actual  modes  of  the  signal.  With  N  actual  modes  selected, 
the  amplitude  coefficient  estimator  is  reapplied  to  acquire  N  amplitude  coefficient 
estimates. 
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SNR  dB 

Figure  5.  The  Ai  phase  inverse  MSE  of  the  OLS  and  LS  estimators  over  a  range  of 
SNRs.  Also  plotted  is  the  CRB. 

Figures  5,  6,  7,  and  8  are  A„  phase  inverse  MSE  and  bias  plots  for  comparing 
the  performance  of  the  OLS  and  LS  estimators.  The  Monte-Carlo  experiment 
performed  to  create  the  plots  was  identical  to  the  previous  Monte-Carlo  experiment 
for  comparing  the  LS  algorithm  and  the  original  Prony  method.  A  prediction  order, 
P,  of  12  was  used  on  every  OLS  execution. 

In  the  phase  bias  plots,  we  see  an  extended  SNR  range  in  the  unbiased  region 
for  the  OLS  estimator.  In  the  phase  inverse  MSE  plots  we  see  a  corresponding 
SNR  range  where  the  OLS  estimator  performance  is  closer  to  the  CRB  than  the 
LS  estimator  performance,  but  as  SNR  increases,  the  OLS  estimator  performance  is 
inconsistent.  This  is  not  a  desirable  feature  for  parameter  estimators  in  low  noise 
scenarios. 

The  erratic  behavior  as  SNR  increases  can  be  attributed  to  a  near  rank  de¬ 
ficiency  in  the  overmodeled  data  matrix.  For  explanation  purposes,  let  Yover  and 
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Figure  6.  The  Ai  phase  bias  of  the  OLS  and  LS  estimators  over  a  range  of  SNRs. 


Figure  7.  The  A2  phase  inverse  MSE  of  the  OLS  and  LS  estimators  over  a  range  of 
SNRs.  Also  plotted  is  the  CRB. 
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Figure  8.  The  A2  phase  bias  of  the  OLS  and  LS  estimators  over  a  range  of  SNRs. 

Yover  represent  overmodeled  data  matrices  identical  to  Y  and  Y  except  that  their 
dimensions  are  expanded  from  the  model  order  N  to  the  prediction  order  P.  In  Ap¬ 
pendix  A,  we  discussed  how  the  theoretical  rank  of  Yover  is  P — due  to  the  stochastic 
nature  of  its  columns — even  though  the  underling  signal  matrix  rank  is  N.  As  SNR 
increases  and  the  noise  decreases,  the  contrast  in  rank  begins  to  influence  the  con¬ 
dition  of  Y^g^Yover  in  the  pseudo-inverse  estimator 

(101)  Lar  =  -  Y«„yo. 

Further  insight  into  this  problem  is  gained  by  observing  the  pseudo-inverse  esti¬ 
mator  in  terms  of  the  singular  value  decomposition  (SVD).  It  has  been  shown  (25:414) 
that  for  any  M  x  P  matrix  Y,  there  exists  an  M  x  M  unitary  matrix 

(102)  U  =  [ui,U2,--- ,um] 
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and  a,  P  X  P  unitary  matrix 


(103) 


such  that 


(104)  U^YW  =  S 

where  E  is  an  M  x  P  matrix  of  zeros  except  on  the  diagonal.  The  diagonal  entries 
of  E  are  real  and  non-negative.  When  M  >  P  they  are  ordered 

(105)  <^11^  <^22  ^  ^  PP  ^  0. 

If  the  rank  of  Y  is  less  than  P,  say  N  for  example,  then 


(106) 


=  cri\r+2,JV+2  =  •  •  •  =  (Tpp  =  0. 


Hereafter,  we  reference  the  diagonal  elements  of  E  as  the  singular  values  and  use  the 
abbreviated  index,  cr„.  Since  the  singular  values  are  ordered  by  size,  the  columns  of 
U  and  W  have  order  as  well  and  are  referenced  as  the  left  and  right  singular  vectors, 
Un  and  Vn,  respectively. 

With  the  SVD  defined,  we  can  return  to  the  pseudo-inverse  estimator  for  hover- 
With  the  guidance  of  Golub  and  Van  Loan  (19:242),  we  introduce  the  SVD  to  this 
problem.  Recall  that  we  desire  the  hfmer  that  minimizes  the  sum  of  the  squared  error 


(107) 


J(h 

over) 


(2/0  T  ^overhover')  (^/O  T  Yoverhover] 


Vo  T  ^overho 


2 

2  ■ 
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Let 


(108)  U^Y^erW  =  Yover 

as  defined  by  the  SVD.  Because  orthogonal  transformations  do  not  change  the  2- 
norm  of  a  vector, 

J{bover)  =  \\U^yO+{U^Y^erW)(W^bo,er)ll 

(109)  =  \\U^yo  +  ^^over{wHo,er)ll. 

Also  let  q;  =  W^bover,  so  that 

~  II  ff  “  2 

Jibover)  ~  II 2/0  "h  2 

P 

(110)  =  J2\^nyO  +  (^nOin\‘^- 

n=l 

From  the  last  expression  in  Equation  110,  we  see  that  a„  =  --^u^yo  is  the  minimal 
solution  to  the  LS  problem.  Therefore,  since  bover  =  Wa,  the  unique  minimum 
estimate  of  bover  is 

(111)  b„„  =  -W'E+„(/"So 

where  is  a  P  x  {M  —  N)  matrix  of  zeros  except  on  the  diagonal  where 

The  matrix  is  also  the  pseudo  inverse  of  ToDer-)  and  when  Yover  is  full  rank, 

=  -irs+„c/''s„ 

(112)  = 
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Since  the  underlying  signal  matrix  of  Yover  has  a  rank  of  N,  we  know  as  SNR 
increases,  the  smallest  P  —  N  singular  values  approach  zero.  Therefore,  the  inverse  of 
cr^+i  ■  ■ .  CTp,  on  the  diagonal  of  approach  oo.  This  action  corrupts  the  estimate 
of  bover  and  leads  to  an  erroneous  but  unique  solution.  To  alleviate  this  problem. 
Tufts  and  Kumaresan  (72)  propose  intervening  at  the  bover  estimate  and  forcing  the 
last  P  —  N  singular  values  to  zero.  The  effect  of  this  action  is  to  create  a  low  rank, 
N,  approximation  of  Ygver,  call  it  Ycrueri  where 

(113)  U^YoverW  =  llover. 

The  unitary  matrices,  U  and  tT,  are  the  original  unitary  matrices  from  the  SVD  of 
%ver)  and  the  matrix  J^over  is  the  original  Sever  with  the  last  P  —  N  singular  values 
set  to  zero.  Although,  with  a  rank  deficient  data  matrix,  a  unique  solution  no  longer 
exists,  the  SVD  can  be  utilized  to  obtain  a  minimum  2-norm  solution  for  bover-  For 
this  reason,  a  rank  deficient  data  matrix  is  preferred  over  a  near  rank  deficient  data 
matrix  (3:510-511). 

A  proof  of  the  minimal  2-norm  solution  follows  the  same  derivation  from  Golub 
and  Van  Loan  (19:242)  as  the  full  rank  LS  problem  with  the  SVD.  In  the  rank 
deficient  scenario. 


•7(5over)  — 


(114) 


^  Vo  ^over^ 

J2\^nyO  +  ^nOCnf  +  \^n  Vol^ 


71  =  1 


Tl  =  iV+l 
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because  ajv+i  =  ^n-\-2  =  . . .  =  o'p  =  0.  Again,  =  --^u^yo  obtains  the  minimal 
solution  to  J{bover)-  Therefore, 

(115)  Ler  =  -wiC,,,U^yo 

—  "I" 

is  a  minimal  2-norm  solution  for  bover-  Note  that  with  ,  the  pseudo- inverse 

is  calculated  even  with  a  rank  deficient  Ygver- 

In  1982,  Tufts  and  Kumaresan  used  the  SVD  and  a  low  rank  approximation 
of  Yfyver  for  the  superimposed  exponential  parameter  estimation  problem  to  effec¬ 
tively  remove  the  erratic  behavior  seen  in  the  OLS  estimator  at  high  SNR  (72).  In 
1987,  Rahman  and  Yu  (57)  used  the  SVD  and  a  low  rank  approximation  in  a  total 
least  squares  (TLS),  rather  than  a  least  squares,  methodology  for  estimating  the  LP 
coefficients.  With  this  adjustment,  they  were  able  to  attain  slightly  better  parame¬ 
ter  estimates.  Ironically,  four  years  before  Rahman  and  Yu,  Tufts  and  Kumaresan 
incorporated  the  same  adjustment  into  their  algorithm  (36),  but  the  equivalence  be¬ 
tween  their  new  algorithm  and  Rahman  and  Yu’s  algorithm  was  not  realized  until 
1991  (13).  We  will  now  introduce  total  least  squares  as  a  subtle  extension  to  the 
SVD  enhanced  OLS  algorithm  just  developed  in  this  dissertation.  Our  delivery  is 
unique  and  tractable. 

Conceptually,  TLS  dates  back  to  the  early  1900s.  Wide  spread  use  of  TLS  did 
not  occur  until  the  1980s  when  Golub  and  Van  Loan  utilized  the  SVD  to  increase 
the  method’s  ease  of  implementation  and  understanding  (18).  For  the  superimposed 
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exponential  problem,  TLS  can  be  applied  to  minimize  the  sum  of  the  squared  error 


(116) 


J(bover}  — 

Vo  “k  y>t)er^over 

1 

= 

Y 

over 

bover 

From  Appendix  A,  we  know  the  rank  of  V^er  is  P  + 1  even  if  it  is  near  rank  deficient. 
Therefore,  there  is  no  solution,  other  than  a  vector  of  zeros,  that  will  cause  J{bover) 
to  equal  0.  In  other  words,  a  vector  of  zeros  is  the  only  vector  in  the  null  space 
of  Yover-  The  minimizing  problem  of  Equation  116  can  be  recast,  equivalently,  as  a 
search  for  the  minimum  perturbation  of  Yover  so  that  [  1  is  a  solution  to 


(117)  {Yo,er  -  Ay^er) 

where  AYo^er  is  the  perturbation  (28:33).  For  the  nonzero  solution,  [  i  ]^, 

to  exist,  the  perturbation  matrix  must  be  such  that  {Yover  ~  AYoner)  has  a  rank 
less  than  P  +  1.  As  shown  previously,  the  SVD  can  be  used  to  create  a  low  rank 
approximation  of  y,.„er)  call  it  Yoveri  by  forcing  the  smallest  singular  values  of  y,„er  fo 
equal  zero.  The  Eckart-Young-Mirsky  matrix  approximation  theorem  (51)  proves,  in 
the  matrix  2-norm  and  the  Frobenius  norm,  that  a  low  rank  matrix  approximation 
created  with  the  SVD  accomplishes  the  rank  reduction  with  the  minimum  possible 
perturbation.  Consequently,  the  SVD  facilitates  the  TLS  methodology. 
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The  simple  least  squares  approach  can  also  be  recast  into  a  search  for  the 
minimum  perturbation  of  yo  so  that 

(118)  (yO  ^2/0)  T  ^over^over  ~  0 

where  Ayo  is  the  perturbation  (28:33).  The  prevailing  argument  for  TLS  is  that 
the  ability  to  perturb  the  entire  data  matrix  instead  of  just  one  column  of  the  data 
matrix  obtains  a  better  estimate  of  fcotierj  e.y.,  (57,  27,  13,  1,  6).  We  address  this 
issue  further  in  Chapter  IV,  but  for  the  overmodeled  LP  approach,  it  appears  TLS  is 
slightly  more  effective  than  LS.  We  believe  a  portion  of  the  improvement,  if  not  all  of 
the  improvement,  is  due  to  a  lesser  exalted  minimum  norm  solution  provided  by  TLS 
and  essential  to  the  overmodeled  application.  The  minimum  norm  solution  was  also 
identified  by  Golub  and  Van  Loan  in  their  introductory  TLS  article  (18:891-892). 
To  develop  it  we  need  to  review  another  feature  of  the  SVD. 

We  know  for  any  M  x  P  matrix  Y  with  rank  less  than  P,  say  N, 

U^YW  =  E 

(119)  YW  =  PS 

^  YWn  =  CFnUn- 

Therefore,  the  right  most  linearly  independent  vectors  of  W,  wn+i  •  ■  •  tSp,  span  the 
space  where  a^+i . .  .<Jp  equals  zero.  Consequently,  Ywn+i  .  ..Ywp  equals  0.  Thus, 
the  right  most  {N  +  1) ...  P  singular  vectors  of  W  are  a  basis  for  the  null  space  of 
Y.  In  other  words,  the  range  space  of  wjv+i  ...wp  is  the  null  space  of  Y. 

In  our  application,  if  we  only  require  the  low  rank  approximation  Yover  fo  have 
a  rank  1  less  then  the  rank  of  Ygver,  there  is  only  one  vector,  wp+i,  in  the  null  space 
oi  Yover-  If  we  normalize  that  vector  by  dividing  the  first  element,  into  all  the 
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elements  of  wp+i,  we  have  the  unique  minimum  solution  for  [  i  ]^.  Typically, 

in  the  overmodeled  LP  coefficient  estimation  problem,  we  desire  to  approximate 
Yover,  of  rank  P  +  1,  with  a  matrix  Yover,  of  rank  N,  where  N  is  significantly  less 
than  P.  Therefore,  the  null  space  of  Yover  is  the  range  space  of  wn+i  . .  .wp  and  an 
infinite  number  of  solutions  exists. 

To  develop  Golub  and  Van  Loan’s  technique  for  determining  the  minimum 
norm  solution  with  a  multi-dimensional  null  space,  let  the  P  X  1  vector 

(120)  Itls  =  -[®] 

7 

symbolize  the  minimum  norm  solution.  Also  let 


(121) 


W2 


Wn+1 


Wp+1 


represent  the  basis  of  all  solutions.  We  desire  a  linear  transformation,  H,  such  that 


(122) 


W2H  = 


(P+l)x(P-^■+l) 


Thus,  the  goal  of  H  is  to  rotate  the  energy  of  the  top  row  of  W2  into  7  so  that  one 
column  of  W2II  has  the  largest  7  in  the  range  space  of  W2.  With  the  maximum 
7,  the  solution,  =  ^[®],  is  minimized  for  any  norm  (18:892).  By  definition, 

a  readily  available  Householder  matrix  can  be  computed  to  accomplish  this  task. 
Additionally,  since  only  the  last  column  of  W2II  is  of  interest,  only  a  portion  of  E 
is  necessary  for  the  transformation.  (57:1443). 

In  review,  with  the  SVD  and  a  minimum  norm  solution  technique,  it  is  easy  to 
incorporate  total  least  squares  into  the  overmodeled  LP  methodology.  We  refer  to 
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Figure  9.  The  Ai  phase  inverse  MSE  of  the  OTLS,  OLS,  and  LS  estimators  over  a 
range  of  SNRs.  Also  plotted  is  the  CRB. 

the  improved  algorithm  as  overmodeled  total  least  squares  (OTLS).  In  the  algorithm, 
after  the  overmodeled  LP  coefficients  are  estimated  in  a  total  least  squares  sense, 
the  energy  sort  is  used  to  identify  the  extraneous  exponentials,  A„.  After  that,  the 
pseudo-inverse  is  used  to  estimate  the  amplitude  coefficients,  c„. 

Figures  9,  10,  11,  and  12  are  phase  inverse  MSE  and  bias  plots  for  comparing 
the  performance  of  the  OTLS  estimator  with  the  non  SVD  OLS  estimator  and  the 
regular  LS  estimator  derived  earlier.  Again,  the  same  Monte-Carlo  experiment 
as  that  for  OLS,  LS,  and  the  original  Prony  method  was  accomplished.  The  pre¬ 
diction  order,  P,  for  OTLS  was  12.  Prom  the  inverse  MSE  plots,  we  see  that  the 
OTLS  estimator  almost  achieves  the  CRB  and  performs  consistently  better  than  the 
previously  derived  estimators. 

Despite  the  improved  performance,  there  are  significant  drawbacks  to  over¬ 
modeling  in  general.  For  example,  to  attain  the  best  results,  the  OTLS  estimator 
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Figure  10. 


Figure  11. 
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The  Ai  phase  bias  of  the  OTLS,  OLS,  and  LS  estimators  over  a  range 
of  SNRs. 


The  A2  phase  inverse  MSE  of  the  OTLS,  OLS,  and  LS  estimators  over 
a  range  of  SNRs.  Also  plotted  is  the  CRB. 
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Figure  12.  The  Aa  phase  bias  of  the  OTLS,  OLS,  and  LS  estimators  over  a  range 
of  SNRs. 

was  executed  at  various  prediction  orders.  For  the  experiment,  a  prediction  order  of 
12  obtained  estimates  nearest  to  the  CRB.  Recall  that  for  the  CRB  to  be  plotted, 
knowledge  of  the  underling  signal  parameters  is  required.  With  unknown  param¬ 
eters,  the  proper  choice  of  P  may  be  critical  to  the  success  of  the  algorithm  and 
most  likely  will  introduce  another  heuristic  to  the  methodology.  Also,  the  increased 
degree  of  the  LP  polynomial,  brought  on  by  the  artificially  high  model  order,  is  dis¬ 
turbing.  Rooting  the  polynomial  for  the  A„  estimates  is  an  inherently  ill-conditioned 
numerical  procedure  and  it  is  well  known  that  rooting  errors  are  greatly  exacerbated 
by  increasing  the  polynomial’s  degree.  When  these  drawbacks  are  coupled  with  the 
fact  that  even  under  straight  forward  simulations,  the  estimator  was  unable  to  at¬ 
tain  the  CRB,  we  question  the  use  of  overmodeling  in  the  superimposed  exponential 
parameter  estimation  problem. 
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In  the  next  section,  we  introduce  our  modification  to  linear  prediction  which 
is  not  heuristic  and  does  not  utilize  overmodeling.  The  modification  is  completely 
couched  in  tractable  statistics.  In  fact,  the  resulting  cost  function  is  identical  to  the 
cost  function  derived  from  maximum  likelihood. 

3.4  Iterative  Generalized  Least  Squares 

The  development  that  follows  is  original  work  and  extends  on  the  LP  equation 
introduced  in  the  beginning  of  this  chapter.  Recall  the  equation 

(123)  l>Qy[m]  +  biy[m  -  1]  +  . . .  +  bNy[m  -  N]  =  e[m] 

as  a  noisy  modification  to  the  noiseless  LP  equation 


(124)  6os[m]  +  bis[m  —  1]  +  . . .  +  b}^s[m  —  N]  =  0. 


Also,  recall  that 


(126) 


2/[m]  =  5[m]  +  w[m] 
5[m]  =  y[m]  —  w[m]. 


If  we  substitute  the  last  expression  for  s[m]  into  the  noiseless  LP  equation,  we  get 
the  same  noisy  modified  relationship  as  in  Equation  123,  only  the  LP  equation  error 


(126)  e[m]  =  bow[m\  +  biw[m  —  1]  +  . . .  +  bNw[m  —  N]. 

Now,  the  LP  error,  e[m],  has  known  structure.  It  is  the  sum  of  weighted  Gaussian 
random  variables,  and  although  each  w[m]  is  uncorrelated  over  all  m,  each  e[m]  is 
correlated  with  the  ±iV  adjoining  elements  of  e[m]. 
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In  vector  notation, 


(127) 


Yb  =  e. 


With  Equation  126,  we  know  the  statistics  of  the  LP  equation  error  vector,  e.  Since 
each  w[m]  in  e[m]  has  mean  zero,  the  mean  vector  of  e  is  0.  The  LP  error  covariance 
matrix,  a'^R,  of  e  is  defined  as  the  (M  —  N)  x  (M  —  N)  matrix 


(128) 


E{e[N]e*[N]}  E{e[N]e*[N  +  1]} 
E{e[N +  l]e*[N]}  E{e[N  +  l]e*[N  +  1]} 


E{e{N]e*[M  -  1]} 
E{e[l]e*[M-  1]} 


£;{e[M- l]e*[iV]}  £;{e[M -  l]e*[l]}  •••  ^{e[M  -  l]e*[M  -  1]} 


where,  from  (126), 


(129) 


E{e[mi]e*[m2]}  = 

I  fZ)  -  ^])  (Y^Kw[m2  -  n] 

I  \»i=0  /  \ti=0 


(130)  mi  =  N,N +  1,...  ,M -1,  m2  =  iV,  iV  +  1, . . .  ,M  -  1. 


When  the  product  from  (129)  is  carried  out  for  each  element  of  the  matrix  a'^R 
and  the  uncorrelated  assumption,  E{w[mi]w[m2]}  =  0  for  mi  7^  m2,  is  invoked  for 
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the  cross  terms,  we  calculate 


(131)  Ef=o*>Ef>t+i-i.  i>3  ■ 

0,  |i-i|>Af 

k 

Alternatively, 

(132)  R  =  BB^ 

where,  as  defined  in  Chapter  II,  B  is  an  (M  —  N)  X  M  Toeplitz  matrix  of  LP 
coelRcients  in  the  form 

6o  h\  •  •  •  hjf  0  •  •  •  0 

0  6o  '  •  ) 

i  0 

0  •  •  •  0  6o  bi  •  •  •  bjf 

The  matrix  B  has  full  row  rank  M  —  N.  Therefore,  R  has  full  rank  M  —  N  and 
is  positive  definite.  A  Cholesky  decomposition  of  R  provides  a  nonsingular,  lower 
triangular,  (M  —  N)  x  {M  —  N),  matrix  Ri  such  that  the  LP  equation  error  covariance 
matrix  is  factored  as 

(134)  RiRf  =  R. 

The  LP  coefficients  in  R  are  unknown,  but  for  a  moment  assume  we  know  R. 
If  so,  the  inverse  of  Ri,  Rj'^,  can  be  used  to  linearly  transform  the  LP  error  vector,  e. 
A  linear  transformed  Gaussian  distributed  error  vector,  with  mean  vector  zero,  also 
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has  mean  vector  zero,  and  a  linear  transformed  Gaussian  distributed  error  vector, 
with  covariance  matrix  a'^R,  has  covariance  matrix  (61:59).  Because 

R  =  RiRf,  we  know 


(135) 


Rr^R  =  Rf 

Rf^RiR^Y  =  I- 


Therefore  the  covariance  matrix  of  Ri  is  aH  and  Rj  is  distributed 


(136) 


Rj-^e'^CN{0,aH). 


Since  6o  can  be  normalized  to  1  without  affecting  the  A„  estimates,  the  equation 

(137)  y6  =  e 
can,  once  again,  be  rearrange  to 

(138)  yo  +  Yb  =  e. 

This  implies 

(139) 


Rr'-yo  +  Rr'Yi  =  RT'e. 


Estimating  b  in  Equation  139  can  be  cast  as  a  linear  least  squares  problem.  In  this 
form,  we  know  the  unique  minimum  solution  is  obtained  with  the  pseudo-inverse 
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estimator 


b  =  -  ((fir'yfw'?))'' (flr'yfBr’so 

(140)  =  -{Y^R-^yy^Y^R-^. 

Recall  from  Chapter  II,  since  Rj'^e  is  Gaussian  distributed  with  mean  vector  zero 
and  covariance  matrix  aH,  the  pseudo-inverse  estimator  is  also  the  MVU  estimator. 
Linearly  transforming  a  least  squares  problem  to  decorrelate  the  error  vector  is  known 
as  generalized  least  squares  (GLS)  (64:60-64). 

In  our  application,  R  is  unknown  because  b  is  unknown.  To  overcome  this, 
we  turn  to  an  iterative  generalized  least  squares  (IGLS)  approach  which  is  similar 
to  the  IQML  algorithm  of  Chapter  II.  In  IGLS,  previous  estimates  of  b  are  used  to 
create  R  for  re-estimating  b  with  the  GLS  estimator  (140).  The  commonality  of  our 
IGLS  algorithm  and  the  IQML  algorithm  is  in  the  cost  function  they  both  attempt 
to  minimize.  Since 

(141)  Yb  =  e, 

we  know 

(142)  RY^Yb  =  Rf^e. 
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In  the  least  squares  problem,  we  wish  to  minimize  the  cost  function 


(143) 


J{b)  =  {RT^e)^  (RT^e) 

=  (Rj-^Yiy  [Rj-^Yb) 
=  b^Y^  [RiRfyWb 
=  ¥Y^{R)-^Yb 
=  b^Y^{BB^)-'^Yb. 


This  expression  for  J(6)  is  identical  to  the  maximum  likelihood  cost  function  L(b) 
for  the  superimposed  exponential  problem.  Attempting  to  minimize  L(b)  by  using  a 
prior  estimate  of  b  to  fix  {BB^)~^  and  then  solving  the  remaining  quadratic  problem, 
is  similar  to  using  a  prior  estimate  of  b  to  fix  Ry^  and  then  solving  the  generalized 
least  squares  problem.  Like  IQML,  we  typically  initialize  IGLS  with  a  least  squares 
estimate.  We  discuss,  in  detail,  the  intricacies  of  minimizing  the  cost  function  in 
Chapter  IV.  Figures  13,  14,  15,  and  16  are  phase  inverse  MSE  and  bias  plots  for 
comparing  the  performance  of  the  IGLS  algorithm  with  the  OTLS  estimator  and  the 
non  SVD  OLS  estimator  derived  earlier.  Again,  the  same  Monte-Carlo  experiment 
as  that  for  OTLS,  OLS,  LS,  and  the  original  Prony  method  was  accomplished.  The 
number  of  IGLS  iterations  utilized  in  every  execution  was  10.  From  the  figures, 
we  see  that  the  IGLS  algorithm  shares  a  common  noise  threshold  with  the  OTLS 
estimator,  but  above  the  noise  threshold,  the  IGLS  algorithm  attains  the  CRB.  The 
performance  of  IGLS  completely  corroborates  the  development  of  this  chapter  and 
the  maximum  likelihood  chapter. 
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Figure  13. 


Figure  14. 


The  Ai  phase  inverse  MSE  of  the  IGLS,  OTLS,  and  OLS  estimators  over 
a  range  of  SNRs.  Also  plotted  is  the  CRB. 


The  Ai  phase  bias  of  the  IGLS,  OTLS,  and  OLS  estimators  over  a  range 
of  SNRs. 
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Figure  15. 


Figure  16. 


The  A2  phase  inverse  MSE  of  the  IGLS,  OTLS,  and  OLS  estimators  over 
a  range  of  SNRs.  Also  plotted  is  the  CRB. 
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3.5  Chapter  Conclusion 

In  this  chapter,  the  methodology  of  linear  prediction  based  parameter  esti¬ 
mation  for  the  superimposed  exponential  problem  was  developed  from  its  earliest 
beginning.  The  introduction  of  least  squares  to  linear  prediction  was  explained,  and 
then  the  heuristic  path  of  overmodeling  was  examined.  We  concluded  that  although 
overmodeling  initially  improved  linear  prediction,  it  ultimately  inhibited  linear  pre¬ 
diction  from  attaining  truly  maximum  likelihood  quality  performance.  To  address 
this  problem,  we  introduced  a  methodology  that  demonstrated  how  the  noiseless 
LP  equation  can  be  modified  to  function  in  the  noise  corrupted  scenario  without 
heuristics.  The  result  is  a  statistically  sound,  tractable,  LP  based  algorithm  for  the 
parameter  estimation  problem  with  maximum  likelihood  performance.  We  call  the 
algorithm  iterative  generalized  least  squares. 

We  concede  IGLS  differs  little  from  the  algorithm  developed  with  maximum 
likelihood,  but  we  insist  our  development,  from  linear  prediction,  is  valuable.  Maxi¬ 
mum  likelihood  built  its  case  at  the  expense  of  linear  prediction  (4:1081).  Undoubt¬ 
edly,  this  tarnished  all  previous  linear  prediction  efforts.  By  giving  linear  prediction 
the  capability  to  possibly  obtain  the  MVU  estimator,  we  allow  the  principles  of 
the  LP  methodology  to  be  maintained  and  carried  to  new  applications.  For  some 
applications,  it  is  more  intuitive  to  construct  an  estimator  with  our  methodology 
than  with  the  maximum  likelihood  methodology.  The  application  motivating  our 
research — deep  level  transient  spectroscopy — is  in  this  category.  In  Chapter  VI,  we 
demonstrate  how  our  linear  prediction  based  methodology  facilitates  an  estimator 
for  DLTS  and  how  deriving  the  same  estimator  from  maximum  likelihood  is  dif¬ 
ficult.  Before  doing  so,  the  next  two  chapters  are  used  to  analyze  the  details  of 
implementing  IGLS. 
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IV.  Cost  Function  Minimization 


4 . 1  Chapter  Introduction 

In  Chapter  II,  we  identified  the  exact  maximum  likelihood  (ML)  cost  function 
for  superimposed  exponential  parameter  estimation.  In  the  same  chapter,  we  out¬ 
lined  an  iterative  quadratic  minimization  algorithm  for  minimizing  the  ML  cost  func¬ 
tion.  The  algorithm  is  known  as  iterative  quadratic  maximum  likelihood  (IQML). 
In  Chapter  III,  we  developed  our  own  linear  prediction  (LP)  based  estimation  algo¬ 
rithm.  The  algorithm  ultimately  attempts  to  minimize  the  same  ML  cost  function. 
We  called  it  iterative  generalized  least  squares  (IGLS).  In  this  chapter,  we  further 
analyze  and  compare  the  IQML  and  IGLS  algorithms.  We  also  introduce  a  simi¬ 
lar  total  least  squares  (TLS)  based  algorithm  for  minimizing  the  ML  cost  function. 
The  iterative  total  least  squares  (ITLS)  algorithm  is  an  original  contribution  of  this 
dissertation.  By  introducing  ITLS,  we  can  address  TLS  without  the  heuristic  of  over¬ 
modeling  and— by  comparing  IQML,  IGLS,  and  ITLS — bring  considerable  insight  to 
the  minimizing  task. 

Before  continuing,  we  need  to  re-emphasize  an  important  issue.  Despite  the 
effectiveness  of  the  IQML,  IGLS,  and  ITLS  algorithms,  they  do  not  directly  minimize 
the  ML  cost  function.  Even  in  its  simplified — LP  coefficient  based — form,  the  ML 
cost  function, 

(144)  L{b)  =  b^Y^{BB^)-^Yb, 

is  nonlinear.  When  a  previous  estimate  of  b  is  used  to  calculate  and  fix  (BB^)~^, 
the  cost  function  has  been  “linearized”.  All  three  algorithms  are  variations  of  this 
assumption.  They  are  not  guaranteed  to  converge,  and  even  if  they  converge,  they 


64 


are  not  guaranteed  to  obtain  the  global  minimum  (38:1218).  Excellent  empirical 
performance  is  our  best  measurement  for  acceptance. 

4-2  Iterative  Quadratic  Maximum  Likelihood 

When  is  calculated  with  a  previous  estimate  of  b,  the  expression 

(145)  Y^{BB^)-'^Y 
is  Hermitian,  and 

(146)  b^Y^(BB^)-^Yb 

has  quadratic  form.  To  avoid  the  trivial  solution  while  minimizing  Equation  146, 
the  vector  b  must  be  constrained.  One  way  to  constrain  b  is  to  insist  b^b  =  1  or 
||5||2  =  1.  Because  we  eventually  root  the  LP  polynomial  formed  from  6,  we  are 
not  concerned  with  the  actual  values  of  in  6,  but  rather  the  ratios  between  them. 
Therefore,  the  nontriviality  constraint  b^b  =  1  is  reasonable. 

Before  explaining  how  to  minimize  the  ML  cost  function  with  the  b^b  —  1 
constraint,  we  need  to  note  that  in  the  past,  researchers  have  minimized  the  cost 
function  with  a  different  nontriviality  constraint,  e.g.  (39,  4,  40,  8).  Because  recent 
implementations  of  IQML,  (52,  26),  utilize  the  b^b  =  1  nontriviality  constraint,  we 
are  associating  the  b^b  =  1  constraint  with  IQML  for  this  dissertation. 

With  this  established,  note  b^b  is  a  real  scalar  function  of  b,  and 


(147) 


WY^{BB^)-'^Yb 


is  also  a  real  scalar  function  of  b,  we  can  introduce  a  real  Lagrange  multiplier,  A,  and 
recast  this  problem  to  one  of  minimizing  the  Lagrangian  (70:691-694) 


(148) 


Mh,  X)  =  Fy®(SB*)-*y5  +  ac(B) 


where 


(149) 


C(6)  =  1  -  6''6. 


At  a  stationary  point,  we  expect 


(150) 


aL(6,A) 

db* 


=  0 


and 


(151) 


aL(6,A) 

dX 


=  0. 


From  Appendix  B,  we  know 

(152)  =  Y^{BB^)-^Yb  -  Xb, 


and  by  definition,  we  know 


(153) 


dL{b,X) 

dX 


1  -  b^b. 


Therefore,  at  a  stationary  point, 


(154) 


Y^iBB^y^Yb  =  Xb 
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and 


(155)  b^b  =  1. 

Equation  155  is  the  desired  constraint,  and  Equation  154  is  an  eigenvalue-eigenvector 
expression.  The  Lagrange  multiplier,  A,  is  an  eigenvalue  of  There¬ 

fore,  the  eigenvector,  b,  associated  with  the  minimum  eigenvalue,  A„,  from  the  eigen¬ 
value  decomposition  of  Y^(BB^)~^Y  is  the  unique  minimizing  solution  for  b. 

The  IQML  algorithm  is  initialized  with  a  previous  estimate  of  b  (typically 
from  a  least  squares  solution).  With  that  estimate,  Y^{BB^)~^Y  is  calculated 
and  eigenvalue  decomposed.  From  the  decomposition,  the  minimizing  eigenvector  is 
selected  as  the  next  estimate  of  b.  The  process  is  continued  until  an  iteration  limit 
or  minimum  estimate  change  limit  is  met.  Maximum  estimate  improvement  usually 
occurs  in  the  first  few  iterations.  When  the  signal  consists  of  pure  sinusoids  in  noise, 
as  is  the  case  in  radar  applications,  the  algorithm  can  be  initialized  with  an  estimate 
obtained  from  a  spectral  analysis  of  the  signal. 

4-3  Iterative  Total  Least  Squares 

At  this  time,  we  introduce  our  iterative  total  least  squares  approach  for  min¬ 
imizing  the  ML  cost  function  because  it  is  analytically  equivalent  to  IQML.  As  in 
Chapter  III,  let 

(156)  RiRf  =  R^  BB^ 
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where  Ri  is  the  nonsingular,  lower  triangular,  (M  —  N)  x  (M  —  N),  Cholesky  de¬ 
composition  of  R.  Now  we  can  re-express  the  ML  cost  function  as 

L(b)  =  WY^{BB^)-^Yb 
=  WY^{RiRf)-^Yb 

=  b^Y^{Rj-YRr^yb 

(157)  =  WRT^YbWi 

Recall  from  the  singular  value  decomposition  (SVD),  there  exists  unitary  matrices 
U  and  W,  and  a  diagonal  matrix  E,  such  that 

(158)  y  =  U^W^. 

Therefore, 

Y^Y  = 

= 

(159)  =  WEEW^. 

This  implies 

(160)  Y^Ywn  =  alwn 

Using  the  SVD  to  decompose  Rj'^Y  from  Equation  157,  let 

(161)  RfV  = 
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so  that 


{Rr^Yf{Rr^Y)w^  =  (Y^{RrYRr^Y)wn 

=  Y^{BB^YYwn 

(162)  =  alwn. 

This  relationship  is  exactly  the  same  as  the  eigenvalue-eigenvector  relationship, 
Equation  154,  developed  in  the  previous  IQML  section.  Therefore,  through  the 
Lagrangian,  we  know  the  singular  vector  u;„  associated  with  the  minimum  singular 
value,  (T„,  is  the  minimizing  solution  for  b.  In  the  vein  of  TLS,  we  say  that  the 
minimum  perturbation  of  R]'^Y — that  reduces  its  full  column  rank  by  one  and  al¬ 
lows  the  right  most  singular  vector  of  W  into  the  null  space — occurs  when  a  low 
rank  approximation  of  RJ'^Y  is  created  by  zeroing  the  smallest  singular  value  of  its 
SVD  (51).  Therefore,  the  singular  vector  Wiv+i  of  RJ'^Y  is  equal  to  the  eigenvector 
of  Y^{BB^)~^Y  associated  with  the  smallest  eigenvalue,  and  is  also  the  minimiz¬ 
ing  solution  of  the  linearized  ML  cost  function.  Consequently,  the  ITLS  algorithm 
is  identical  to  the  IQML  algorithm  except  that  the  eigenvalue  decomposition  of 
Y^{BB^)~^Y  is  replaced  with  the  singular  value  decomposition  of  R^^Y. 

The  ITLS  and  IQML  equivalence  was  demonstrated  to  add  insight  to  the 
h^h  =  1  constrained  approach  to  minimization.  For  example,  researchers  in  TLS 
have  developed  a  closed  form  expression  for  6  based  on  the  assumption  that  6o 
can  be  normalized  after  solving  for  h  (28:36-37).  With  the  demonstrated  equiva¬ 
lence,  this  insight  applies  to  IQML  and  ITLS  as  well.  Assume,  after  normalization, 
^  =  [  1  Y ■  Recall  that  when  h  is  the  minimizing  solution, 

(163)  (RT^Y)’’(RJ'Y)1  = 
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Therefore, 


(164) 


.Rr‘!/o;flr'y]''[«r'so;flr'y] 


{Rr'yo)''(Rr%)  (Rr%f(Rr'Y) 

(Rr'Y)^(Rr%)  (Rr'Y)’’(Rr‘Y) 


From  the  bottom  row  of  Equation  164, 


2 

iV+l 


1 

b 


(165)  =>  6  =  -((B,-iy)»(ii,->y)-^2^,l)"'(B,->f)*i{,-i5„ 

=S-  6  =  -  (y "B-'y  -  ■'  Y«R-%. 

Only  the  smallest  singular  value  of  is  necessary  to  express  6  in  a  closed  form 
solution. 

Notice  the  closed  form  solution  is  not  used  in  either  the  IQML  or  the  ITLS 
algorithms.  It  was  developed  here  to  provide  insight  in  the  algorithm  analysis  section 
at  the  end  of  this  chapter.  Also  note,  6o  =  1  is  not  constrained  when  determining 
the  smallest  singular  value.  It  is  only  a  result  after  assuming  bo  of  the  solution  b  can 
be  normalized.  In  the  next  section,  6o  =  1  is  a  constraint  for  minimization. 


4-4  Iterative  Generalized  Least  Squares 

In  Chapter  III,  we  developed  the  IGLS  algorithm  from  linear  prediction.  For 
continuity  with  this  chapter,  we  recast  the  IGLS  algorithm  as  a  development  from 
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the  ML  cost  function.  Recall  from  the  previous  section  that 


Lib)  =  b^Y^{BB^)-'^Yb 

(166)  =  \\RT^Yb\\l 

where  RiRf  =  BB^ .  For  IGLS,  a  previous  estimate  of  b  is  also  utilized  to  calcu¬ 
late  and  fix  Ri,  but,  instead  of  the  b^b  =  1  nontriviality  constraint,  the  bo  of  b  is 
constrained  to  be  one.  With  this  constraint,  we  have  the  familiar  relationship 

(167)  R^^Yb  =  Rj-%  +  Rj-^Yb  =  Rf 

for  minimizing  the  linearized  ML  cost  function.  In  Chapter  III,  we  showed  that  the 
pseudo-inverse  estimator 

i  =  -{(Rr'Yf(Rr'Y)y'{Rr'YfRr% 

(168)  =  -{Y^  R-^Y)-^Y^  R-%. 

is  the  unique  minimum  solution  for  the  sum  of  the  squared  error,  {Ry^e)^{R^^e), 
and  that  the  Ry^  pre-multiply  has  a  decorrelating  affect  on  the  LP  equation  error 
vector,  e.  In  Chapter  II,  we  showed  that  the  pseudo-inverse  estimator  is  the  minimum 
variance  unbiased  (MVU)  estimator  for  uncorrelated  processes. 

What  naturally  follows,  is  an  iterative  algorithm  where  the  previous  estimate 
of  b  is  used  to  create  ^  for  a  new  estimate  of  b  with  the  pseudo-inverse  estimator. 
Each  iteration  is  known  as  a  generalized  least  squares  solution  and,  in  theory,  as 
each  b  estimate  improves,  a  better  decorrelating  matrix,  RJ'^,  is  created.  Like  IQML 
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and  ITLS,  the  algorithm  is  typically  initialized  with  a  least  squares  (LS)  estimate 
for  b,  and  maximum  estimate  improvement  usually  occurs  in  the  first  few  iterations. 

We  have  now  developed  three  similar  algorithms  for  attempting  to  minimizing 
the  ML  cost  function — IQML,  ITLS,  and  IGLS.  In  the  next  section,  we  analyze  the 
performance  of  each  of  the  algorithms. 

4.5  Algorithm  Analysis 

A  Monte-Carlo  experiment  was  conducted  with  the  same  parameters  used 
throughout  Chapter  III.  Two  hundred,  25  point,  noisy,  data  vectors  were  created — 
with  underlying  parameters  ci  =  C2  =  1,  Ai  =  and  Ai  =  e-^'2'n^(-22) — every 

2  dB  of  SNR  between  0  dB  and  50  dB.  At  each  SNR,  10  iterations  of  each  algorithm 
were  performed  on  each  realization.  In  every  case,  the  algorithms  were  initialized 
with  a  least  squares  estimate. 

Figures  17,  18,  19,  and  20  are  phase  inverse  MSB  and  bias  plots  for  comparing 
the  performance  of  the  three  algorithms.  As  expected,  every  plot  from  the  ITLS 
algorithm  is  identical  to  that  of  the  IQML  algorithm.  From  this,  we  conclude  that 
numerical  differences  between  the  singular  value  approach  of  ITLS  and  the  eigenvalue 
approach  of  IQML  can  not  be  visualized.  On  the  other  hand,  a  difference  between 
the  IGLS  algorithm  and  the  other  two  algorithms  can  be  visualized.  We  note  that 
the  IGLS  algorithm,  with  its  60  =  1  nontriviality  constraint,  obtains  slightly  better 
performance  than  the  ITLS  and  IQML  algorithms,  with  their  h^h  =  1  nontriviality 
constraint. 

The  potential  for  better  performance  with  the  60  =  1  constraint  was  identified 
in  Golub  and  VanLoan’s  introductory  TLS  article  (18:888-890),  and  complete  stud¬ 
ies  of  the  subject  have  been  accomplished  by  VanHuffel  and  Vandewalle  (29)  and 
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Figure  17. 


Figure  18. 


The  Ai  phase  inverse  MSB  of  the  IGLS,  ITLS,  and  IQML  algorithms 
over  a  range  of  SNRs.  Also  plotted  is  the  CRB. 


The  Ai  phase  bias  of  the  IGLS,  ITLS,  and  IQML  algorithms  over  a  range 
of  SNRs. 


’3 
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Figure  19.  The  A2  phase  inverse  MSE  of  the  IGLS,  ITLS,  and  IQML  algorithms 
over  a  range  of  SNRs.  Also  plotted  is  the  CRB. 


Figure  20.  The  A2  phase  bias  of  the  IGLS,  ITLS,  and  IQML  algorithms  over  a  range 
of  SNRs. 
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Stewart  (68).  Analytically,  the  dilference  between  the  IGLS  algorithm  and  the  other 
two  algorithms  can  be  seen  in  the  IGLS  pseudo-inverse  estimator 


(169) 


hais  =  -(Y^R-'Yy'Y^R-'ya 


and  the  closed  form  expression  for  b  developed  in  the  ITLS  section 


(170) 


^ITLS/IQML 


=  -  (y^r-^y  -  ^  y^r-H 


The  subtraction  of  is  the  only  difference  between  the  three  estimators.  The 

ramification  of  the  subtraction  is  now  considered.  Recall  from  the  noiseless 

LP  equation, 


(171) 


sq  +  Sb  =  0 


(172) 


Sb=0. 


In  Appendix  A  we  show  that  the  rank  of  S  is  N.  Therefore,  the  smallest  singular 
value  of  S,  (Tn+i,  is  0.  Equation  171  is  said  to  be  consistent  because 


(173) 


Sb  =  -So- 


From  the  estimators  in  Equations  169  and  170,  we  see  that  if  ajf+i  =  0,  then 
biGLS  =  biTLS/iQML-  When  noise  is  considered. 


(174) 


VQ  +  Yb^  0. 
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The  smallest  singular  value  of  Y  is  not  zero,  and  in  fact,  as  SNR  decreases,  ajv^+i 
increases.  The  linear  problem  is  said  to  be  conflicting  and  when  is  large,  the 
problem  is  highly  conflicting.  When  the  closed  form  expression  for  biTis/iQML  is 
considered  for  a  highly  conflicting  problem,  the  subtraction  on  the  diagonal 

of  the  matrix  Y^R~^Y  is  significant.  This  is  an  ill-conditioning  effect  on  a  matrix 
that  is  to  be  inverted.  Golub  and  VanLoan  call  it  deregularization  and  note  that  the 
condition  of  a  TLS  problem  is  always  worse  than  the  condition  of  the  corresponding 
LS  problem  (18:889).  Stewart  in  (68)  shows  that  least  squares  estimates  and  total 
least  squares  estimates  are  equivalent  within  second  order  terms,  and  he  asserts 
there  is  no  reason  to  prefer  using  TLS  because  of  the  potential  ill-conditioning.  In 
our  application,  we  know  (Tj\r+i  is  large  when  SNR  is  low.  Therefore,  we  actually 
discourage  the  use  of  TLS  related  algorithms  like  ITLS  and  IQML. 

With  regards  to  the  overmodeled  total  least  squares  (OTLS)  algorithm  devel¬ 
oped  in  Chapter  III,  we  disagree  with  the  claim  that  the  slightly  improved  perfor¬ 
mance  of  OTLS,  over  the  other  overmodeled  approaches,  is  due  to  an  increase  in 
perturbation  capability  (57,  27,  13,  1,  6).  Given  the  insight  we  have  gained  in  this 
chapter,  the  most  likely  reason  for  improved  performance  in  OTLS  is  the  readily 
available  minimum  norm  solution  for  the  near  rank  deficient  structure  artificially 
induced  by  overmodeling. 

Returning  to  the  task  of  minimizing  the  exact  ML  cost  function,  we  must 
recognize  a  case  where  the  bo  =  1  nontriviality  constraint  is  inappropriate.  Re¬ 
searchers  focused  on  estimating  the  parameters  of  exponential  constrained  to  the 
unit  circle  (sinusoids)  have  found  the  performance  of  IQML  can  be  significantly  im¬ 
proved  by  placing  an  additional  constraint,  other  than  nontriviality,  on  the  vector 
b  (39,  4,  52,  66).  It  was  recognized  in  (39)  that  for  the  roots  of  b  to  lie  on  the  unit 
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circle,  the  LP  coefficients  of  b  must  posses  conjugate  symmetry 
(175)  h  =  b*^_„  k  =  0,l,...,N. 

The  symmetry  constraint  can  be  implemented  in  various  ways,  but  it  was  noted 
by  Nagesha  and  Kay  that  the  5o  =  1  nontriviality  constraint  can  conflict  with  the 
symmetry  constraint  (52).  The  conflict  arises  because  constraining  bo  to  one  also 
constrains  the  imaginary  component  of  bo  to  zero.  A  symmetry  constraint  with  the 
6o  =  1  constraint  can  be  too  restrictive.  However,  when  the  symmetry  constraint 
is  applied  with  the  b^b  =  1  constraint,  no  assumption  about  bo  is  made,  and  the 
IQML  algorithm  can  accurately  estimate  the  parameters  of  any  two  closely  spaced 
sinusoids  at  SNRs  below  10  dB  (52).  Accurate  estimates  at  those  SNRs  are  5  to  10 
dB  below  the  noise  threshold  of  the  IGLS  algorithm  with  only  the  6o  =  1  constraint. 

Unfortunately,  our  research  is  not  limited  to  exponentials  on  the  unit  circle. 
Therefore,  we  utilize  the  6o  =  1  nontriviality  constraint  implemented  with  the  IGLS 
algorithm. 

4-6  Chapter  Conclusion 

In  this  chapter,  we  considered  the  details  of  minimizing  the  exact  ML  cost 
function  with  the  IQML  and  IGLS  algorithms.  We  separated  the  two  approaches  as 
one  that  applies  the  b^b  =  1  nontriviality  constraint  (IQML)  and  one  that  applies  the 
6o  =  1  nontriviality  constraint  (IGLS).  We  also  introduced  our  own  ITLS  algorithm 
and  demonstrated  its  equivalence  to  IQML.  Because  of  the  closed  form  solution  of 
TLS,  we  were  able  to  bring  considerable  insight  to  the  minimizing  task  with  the 
ITLS  algorithm. 
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We  note,  in  signal  processing,  much  has  been  written  about  the  b^b  =  1 
and  the  feo  =  1  nontriviality  constraint,  e.g.,  (39,  4,  40,  52).  In  the  most  recent 
correspondence,  Nagesha  and  Kay  conclude  that  the  physical  significance  of  the 
coefficients  of  b  usually  suggest  an  appropriate  choice  (52).  Because  of  the  potential 
ill-conditioning  associated  with  the  b^b  =  1  constraint,  we  strengthen  the  conclusion 
by  saying,  the  6o  =  1  constraint  should  always  be  used  unless  the  physical  significance 
of  the  coefficients  dictates  otherwise. 
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V.  Real  Exponentials 


5.1  Chapter  Introduction 

From  Chapter  II  we  know  the  maximum  likelihood  solution  is  attained  when 
the  cost  function 


L{V)  =  y^il-V{V^V)-W^)y 

(176)  =  y^y-y^V{V^V)-^V^y 

is  minimized.  For  the  particular  scenario  of  parameter  estimation  with  signals  con¬ 
sisting  of  a  sum  of  real  exponentials,  we  now  elaborate  further.  First,  the  Hermitian 
transpose,  can  be  replaced  with  the  traditional  transpose,  Also,  because  y^y 
is  not  a  function  of  V,  the  Maximum  Likelihood  solution  can  be  attained  by  finding 
the  V  that  maximizes 

(177)  L{V)  =  fV{V'^V)-W^y. 

Additionally,  we  restrict  ourselves  to  the  study  of  decaying  signals  (transients).  This 
implies  that  “stable”  exponentials,  A„,  are  considered  only  on  the  open  interval  (0, 1). 
Exponentials  with  A„  =  1  (constants),  are  addressed  in  the  deep  level  transient 
spectroscopy  chapter  of  this  dissertation.  Finally,  the  amplitude  coefficients,  c„, 
must  also  be  real,  but  may  be  negative  as  well  as  positive. 

With  a  building  block  approach  in  mind,  we  begin  this  chapter  with  an  in 
depth  analysis  of  maximum  likelihood  estimation  for  “simple”  signals  comprised  of 
one  real  exponential.  We  then  extend  to  the  important  case  of  two  real  exponentials, 
and  conclude  with  the  case  of,  N,  real  exponentials.  For  comparison  with  iterative 
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generalized  least  squares  (IGLS),  we  develop  maximum  likelihood  algorithms,  unique 
to  the  specific  cases,  that  realize  a  slightly  more  accurate  ML  solution.  Ultimately 
though,  we  conclude  that  the  IGLS  algorithm  is  a  very  robust  and  efficient  estimator 
for  the  superimposed  real  exponentials  problem. 

The  analysis  that  follows  is  original  work.  Additionally,  the  maximum  like¬ 
lihood  algorithms  developed  for  the  one  and  two  mode  cases  can  not  be  found  in 
the  literature.  They  contribute  to  signal  processing  as  more  appropriate  maximum 
likelihood  estimators  when  the  real  exponential  problem  can  be  scoped  to  a  specific 
case,  and,  more  importantly,  they  serve  as  an  excellent  benchmark  for  verification  of 
the  multi-mode  IGLS  algorithm. 

5.2  One  Real  Exponential 

For  the  one  mode  problem,  N  =  1,  the  signal  model  is 

(178)  y  =  v{\i)ci  +  w 

where 

(179)  ^(Ai)=[l  Ai  Xj  •••  X^-^f 

is  appropriate.  Therefore,  the  cost  function,  L,  is  explicitly 

L{X)  =  y'^v{iFv)~^v^y 

{fvf 

v^v 

where  the  Ai  reference  is  dropped  from  the  v{Xi)  symbol. 
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2.8 


Figure  21.  One  mode  real  exponential  cost  function  evaluations  over  the  range  of 
A,  (0, 1)  at  various  SNRs.  The  underlying  signal’s  true  exponential  is 
A  =  .8 


The  behavior  of  the  cost  function  L,  over  the  range  of  A,  (0, 1),  for  various  SNRs 
is  displayed  in  Figure  21.  The  25  data  point  observation  vector,  y,  was  generated 
with  c  =  1,  A  =  .8,  and  the  standard  definition 


(181) 


cr2  = 


10  1® 


Recall  that  the  noise  vector  w  is  composed  of  M  Gaussian  random  variables,  u;[m], 
uncorrelated  across  all  m  and  identically  distributed  with  mean  zero  and  variance 

Ct2. 

From  the  figure,  it  appears  the  cost  function  L(A)  evaluated  over  the  range  of 
A  is  unimodal  until  the  SNR  is  very  low.  The  exact  SNR  where  L(A)  becomes  multi¬ 
modal  is  dependent  on  the  specific  observed  realization  y  and  therefore  is  ultimately 
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random.  Even  though  the  SNR  threshold  for  multi-modality  is  random,  we  have 
observed  a  rather  small  variance  in  its  value  over  many  realizations. 

5.2.1  Unimodal  Search.  Our  first  efforts  for  finding  the  A  that  maxi¬ 
mizes  the  cost  function  L  were  for  the  case  of  observations  y  above  the  multi-modal 
SNR  threshold.  Therefore,  we  used  a  unimodal  search  algorithm.  Following  the 
developments  in  Forsythe,  Malcolm,  and  Moler  (17:179-182)  and  Cheney  and  Kin¬ 
caid  (7:457-463),  we  settled  on  a  golden  section  search  algorithm  for  accomplishing 
this  task. 

By  definition,  any  continuous  unimodal  function,  /,  over  an  interval  [a,  b]  has 
one  maximum  (or  minimum).  With  two  evaluations  of  /  at  a'  and  fe',  where  a  <  a!  < 
b'  <  b,  we  can  determine  if  the  maximum  is  in  the  interval  [a,  fe'],  if  f{a')  >  f{b'),  or 
in  the  interval  [o',  6],  if  f{a')  <  f{b').  Iteratively,  we  rename  the  interval  with  the 
maximum,  to  [a,  6],  evaluate  /  at  two  new  points,  a'  and  b',  inside  the  new  interval, 
and  isolate  the  maximum  to  a  new  [a,b']  or  [o',  6].  At  each  iteration,  the  new  search 
interval,  called  the  interval  of  uncertainty,  is  reduced. 

The  goal  of  an  efficient  search  plan  is  to  reduce  the  interval  of  uncertainty 
with  the  minimum  number  of  function  evaluations.  Intuitively,  we  can  evaluate  / 
at  a'  =  a  -I- 1 (6  —  a)  and  b'  =  b—  |(6  —  a)  and  reduce  the  interval  of  uncertainty  by 
I  ~  .3333  at  every  iteration  with  two  evaluations.  Alternatively,  we  can  evaluate  / 
at  a'  =  a  4-  (1  —  r)(b  —  a)  and  b'  =  b  —  (1  —  r)(b  —  a)  where 

(182)  r  =  ~  -6180 
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Figure  22.  Iteration  of  the  golden  section  search  identifying  the  interval  of  uncer¬ 
tainty  reduction  and  the  common  function  evaluation  between  itera¬ 
tions.  The  maximum  resides  in  the  interval  [a,b']  of  iteration  k. 

and  satisfies  the  quadratic  equation 

(183)  rVr-l  =  0. 

This  golden  section  search  reduces  the  interval  of  uncertainty  by  a  factor  of  (1  —  r)  w 
.3820  at  every  iteration  and  eliminates  one  function  evaluation  in  the  next  iteration. 
The  eliminated  function  evaluation  is  inherent  in  the  property  that  either  f{a')  of 
the  current  iteration  will  equal  f{b')  of  the  next  iteration  or  f{b')  of  the  current 
iteration  will  equal  f{a')  of  the  next  iteration  depending  in  which  interval,  [a,  b']  or 
[a! ,  6],  the  maximum  is  determined  to  lie  in. 

Consider  Figure  22,  where  the  maximum  is  determined  to  be  in  the  interval 
[a,  b'].  Let  I  =  b  —  a  and  V  =  b'  —  a.  In  the  current  iteration, 

(184)  a'  =  a+{l-r)l. 
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In  the  next  iteration, 

(185)  b'  =  b  —  {I  —  r)l'  =  a  +  rl'. 

Since  I'  =  rl, 

(186)  b'  =  a  +  r^l  =  a  +  {r  —  1)1. 

Therefore,  a'  of  the  first  iteration  equals  b'  of  the  next  iteration. 

The  large  reduction  in  the  interval  of  uncertainty  at  every  iteration  and  the 
eliminated  function  evaluation  at  the  next  iteration  make  this  search  plan  close  to 
optimal  (17:182).  The  plan  is  known  as  the  golden  section  search  because  of  its 
association  with  the  golden  ratio,  j  «  1.6180.  Its  performance  against  the  IGLS 
algorithm  is  illustrated  in  the  inverse  MSE  and  bias  plots  of  Figures  23  and  24. 
For  the  figures,  400  Monte-Carlo  realizations  of  y  were  created  under  the  same  one 
mode  signal  and  noise  parameters  as  before. 

The  performance  of  the  golden  section  search  and  IGLS  algorithms  can  be 
differentiated  by  the  IGLS  estimator’s  dependence  on  a  good  initial  estimate,  and 
its  relatively  complex  update  routine.  At  low  SNR  these  factors  contribute  to  an 
inferior  performance  when  compared  to  the  golden  section  search  plan.  However,  the 
performance  of  IGLS  is  quite  good  when  recalling  that,  unlike  the  one  dimensional 
golden  section  search,  IGLS  has  robust  multi-exponential  capability. 

Multi-modal  Search.  To  assess  the  ramifications  of  the  unimodal  as¬ 
sumption,  an  estimator,  capable  of  determining  the  maximum  of  a  multi-modal  cost 
function  in  one  variable,  was  derived.  First,  we  began  by  considering  the  maximum 
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SNR  dB 


Figure  23.  The  inverse  MSE  of  the  golden  section  and  IGLS  algorithms  versus  the 
Cramer- Rao  bound  (CRB)  over  a  range  of  SNRs.  The  underlying  signal 
is  of  one  real  exponential  with  A  =  .8 


SNRdB 

Figure  24.  The  bias  of  the  golden  section  and  IGLS  algorithms  over  a  range  of 
SNRs.  The  underlying  signal  is  of  one  real  exponential  with  A  =  .8 
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likelihood  cost  function  in  summation  notation  so  that 


(187) 


L(A)  = 


EM-1  \2m 

m=0 


If  we  let  c  and  A  represent  the  amplitude  coefficient  and  exponential  of  the  underlying 
signal  of  y[m],  respectively,  and  we  let  2  represent  the  unknown  exponential  for 
estimation,  we  can  say 


(188) 


L{z-,  c.  A,  M)  = 


1 

^m=0 


y2m 


Since 


(189) 


M-l 


m=0 


1_^2M 

1-^2 


in  closed  form,  we  know 

L{z]  c.  A,  M) 


(190) 


(1  -  z")  (E"r„‘  cA»z»  +  E";o‘ 

1_22M 

<^(1  -  (E"r„‘(A2)")' 

I  _  22M  + 

20(1  -  z-^)  SS 

1-22“ 

(1-2:")(E"=>H2’")" 

1  -  Z2" 


Also,  since 

M-\  1  _  (\y\M 

(191)  Y.  =  'v 

TO=0  ^ 
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in  closed  form,  we  know 


(192) 


L{z]  c,  A,  M) 


c'^  {l  -  {Xz)^y  {1  -  z^)  ^ 

(1  -  _  xzf 

2c{l-{Xzr){l-z^)E^I^w[m]z 

(1  -  z^^){l  -  Xz) 

{Y:^I^w[m]z-^f 

1  -  z2m 


Observe  that  the  first  rational  function  of  Equation  192  exactly  models  the  cost 
function  of  the  noiseless  signal,  and  therefore,  the  additional  two  rational  functions 
exactly  model  the  contribution  of  measurement  noise  in  the  cost  function  of  the 
noisy  signal,  L.  Our  maximization  plan  is  predicated  on  the  fact  that  there  exists 
coefficients  for  the  numerator  and  denominator  polynomials  of  the  rational  functions 
that  model  L.  If  known,  these  coefficients  could  be  used  with  the  derivative  of  L 
to  find  the  2:  values  that  locally  maximize  and  minimize  L.  The  real  value  of  z, 
in  the  interval  (0,1),  that  yields  the  global  maximum  of  L,  gives  the  maximum 
likelihood  solution.  Unfortunately,  the  three  rational  functions  that  sum  to  L  each 
have  numerator  and  denominator  polynomials  of  at  least  2M  degrees.  With  any 
significant  length  of  signal,  M,  finding  an  analytical  solution  for  the  maximum  of  L 
is  difficult.  With  some  insight  though,  a  rational  low  order  function  approximation 
of  L  can  be  created  with  a  relatively  small  degree  numerator  polynomial  and  an  even 
smaller  degree  denominator  polynomial.  To  do  so,  we  first  observe  the  behavior  of 
the  functions  1  —  {Xz)^  and  1  —  z^^  as  M  increases.  From  Figures  25  and  26  we 
see  that  function  evaluations,  for  z  near  A  =  .8,  quickly  go  to  one  as  M  increases. 
For  approximating  L,  we  assumed  that  by  M  =  25  both  functions  evaluate  to  one. 
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Figure  25.  Evaluations  of  1  —  (Xz)^  for  A  =  .8  and  z  =  .7,  .8,  and  .9  with  respect 
to  M. 
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Therefore, 


L{z]c,X,M)  fa 


(193) 


c^Cl  - 

(iW  + 

2c(l-z^)S"ri;w[m|2" 

(1-Az) 

(1  -  (ES 

1  -  2A2  +  A222  ^ 

2c(l  —  Xz  —  +  Xz^)  Z)m=0 

1  -  2Az  + 

(1  -  2A2  +  (A2  -  1)22  +  2A^3  _  ;^2^4)  ^^M-1 

1  -  2Az  +  A222 


Again  the  first  rational  function  of  the  model  for  L  represents  the  noiseless 
signal.  The  noiseless  function  is  a  quadratic  polynomial  over  a  quadratic  polynomial. 
To  verify  our  assumptions  and  modeling  thus  far,  we  used  an  interpolation  routine  to 
estimate  the  coefficients  of  a  quadratic  over  a  quadratic  rational  function  that  fit  the 
cost  function  of  a  noiseless,  one  real  exponential  signal.  We  then  used  the  coefficients 
to  determine  the  z  that  maximizes  L  with  the  analytical  derivative  approach  just 
described. 

The  interpolation  routine  used  stems  from  the  Cauchy  rational  interpolation 
problem  and  utilizes  the  principle  of  “divided  differences”  as  explained  in  Hilde¬ 
brand  (22:51-79)  and  Scharf  (61:505-508).  In  1990,  Kumaresan  applied  divided 
differences — to  estimate  the  coefficients  of  the  transfer  function  associated  with 
summed  exponentials — with  some  success  (37).  It  was  his  work  that  inspired  us 
to  use  divided  differences  for  estimating  the  coefficients  of  the  rational  maximum 
likelihood  cost  function.  Unlike  ourselves,  Kumaresan’s  effort  was  not  motivated  by 
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the  maximum  likelihood  approach  to  parameter  estimation.  His  algorithm  was  a 
frequency  domain  analog  of  the  extended  Prony  method  with  the  same  performance 
as  the  original  extended  Prony  method  (37). 

We  introduce  approximation  by  divided  differences  for  our  application,  by  first 
illustrating  its  relationship  to  the  linear  interpolation  problem  and  then  extending  it 
to  the  rational  function  interpolation  problem.  The  properties  of  divided  differences 
are  numerous.  We  only  identify  those  properties  that  lead  to  the  rational  function 
coefficient  estimator. 

Assume  U{z)  is  exactly  defined  by  the  linear  model 
(194)  U{z)  =  uq-\-u\z 


over  a  defined  range  of  z.  We  can  then  say  the  ratio 


(195) 


U{z^)  -  U{zq) 

Zi  -  Zo 


is  a  constant,  independent  of  Zq  and  zi  in  the  range  of  z.  The  ratio  is  called  the  first 
order  divided  difference  of  U{z)  and  is  denoted 


(196) 


U[zq,Zi] 


U[zi]  -  U[zo] 
zi  -  Zo 


where  U[zi]  =  U{zi). 

Furthermore,  if  U{z)  is  exactly  defined  by  a  two  degree  polynomial,  then 
U[zo,  zi]  is  a  function  of  z,  and  the  ratio 


(197) 


U[zi,Z2]  -  U[zo,Zi\ 
Z2  -  Zq 
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is  also  a  constant,  independent  of  zq,  zi,  and  Z2  in  the  range  of  This  ratio  is  called 
the  second  order  divided  difference  of  [/(z)  and  is  denoted 


(198) 


U[zq,ZuZ2]  = 


U[zi,Z2]  -  U\zq,zx] 
Z2  -  Zo 


Recursively,  we  can  continue  increasing  the  order,  K,  of  the  divided  difference 
ratio  with  the  equation 

(199)  Cl[z„.  z„ . . .  ,  z;,]  = 

Zk  -  Zq 

For  example,  assume  U(z)  is  exactly  defined  by  a  two  degree  polynomial  such  that 


(200) 


U(z)  =  «0  +  'l^lZ  +  U2Z^. 


Then  as  defined. 


(201) 


U[z,] 

U[zq,Zi] 


U[zq,Zi,Z2] 


U[zq,Zi,Z2,Zz] 


Uq  +  UiZq  +  U2Zq 

Up  +  UiZi  +  U2zl  -  {uq  +  UiZq  +  U2Zq) 
Zl  -  Zp 

Uljzi  -  Zp)+U2{zf  -  zg) 

Zl  -  Zp 
Ui  +  U2{zi  +  Zp) 

Ui  +  U2{Z2  +  Zl)  -  (Ui  +  U2{zi  +  2:0)) 
Z2  -  Zp 

U2{Z2  +  Zi-  ZjZp) 

Z2  -  Zp 

U2  =  constant 

U2  —  U2 
Zp  -  Zp 

0. 
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This  example  illustrates  an  important  property  of  divided  differences:  when 
the  order  of  the  divided  difference  exceeds  the  degree  of  the  underlying  polynomial, 
the  divided  difference  ratio  is  zero.  This  property  extends  to  all  orders  of  divided  dif¬ 
ferences  that  exceed  any  degree  polynomial  when  the  underlying  function  is  exactly 
defined  by  the  polynomial  (22:57).  If  we  assume  a  function  is  closely  modeled  by 
a  Qth  degree  polynomial,  we  also  assume  all  Kth.  order  divided  differences  greater 
than  Q  are  equal  to  0. 

Another  feature  of  divided  differences,  needed  to  explain  our  rational  function 
coefficient  estimator,  can  be  extracted  from  the  recursive  definition  of  the  Kth  order 
divided  difference.  Equation  199.  We  see  that  the  desired  order  divided  difference 
is  a  function  of  the  previous  order  divided  differences.  When  the  definitions  of  the 
previous  order  divided  differences  are  substituted  back  into  Equation  199,  a  more 
generalized  divided  difference  equation  results  (22:55).  In  the  new  equation 

. = 

where 

K 

(203)  W[zi)  =  n(-^*  -  ^j) 

>=0 


Returning  to  the  rational  maximum  likelihood  cost  function,  let  L  take  the 

form 


(204) 


L{z) 


C/(^) 
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implying  that  the  degrees  of  the  numerator  and  of  the  denominator  of  L  are  R  and 
Q,  respectively.  Equivalently,  let 


(205) 


V(z)  =  L{z)U(z). 


From  our  previous  divided  difference  example,  we  know  that  if  we  assume  V[z)  is 
well  modeled  by  an  Rth.  order  polynomial,  we  have 

(206)  v\z„,z„...,zk]  =  Z^^  =  0 
provided  that  K  >  R.  Furthermore,  if  K  >  R  + 

(207)  vlzo,z„...,z^]  =  't'^^^=0 

holds  for  the  polynomial  V{z)  times  the  polynomial  z^.  When  we  combine  Equa¬ 
tion  205  and  Equation  207  we  obtain,  for  K  >  i?  +  s. 


(208) 


L(zi)U(zi)z^ 
■^*=0  W(zi) 


Wizi)  ^q=0 


UqZfzf 


y^Q  y  yK  L{zi)  g 


g+a 


0 

0 

0. 


Recall  that  Equation  208  holds  for  all  Zi  in  the  range  of  z  associated  with 
the  polynomial  assumption.  By  choosing  at  least  Q  values  of  s  and  designating 
K  1  values  of  Zi^  {zo,zi,. ..  ,zk),  where  K  >  i?  -f  s,  we  can  develop  a  system 
of  linear  equations  for  estimating  the  Q  +  \  coefficients  of  U{z)^  (uojMI)-  •  •  ,uq). 
In  the  noiseless  approximation  of  L  for  example,  the  numerator  and  denominator 
polynomial  degrees,  Q  and  R  respectively,  are  equal  to  two.  Therefore,  after  choosing 
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an  s  of  0  and  1,  and  letting  mq  normalize  to  1,  we  can  create  the  exactly  determined 


system  of  two  equations 


where  L{zi)  is  evaluated  with  the  data,  y[m],  under  the  maximum  likelihood  cost 
function 


(210) 


L{zi) 


2-(m=0  "i 


After  estimating  the  coefficients  of  U(^z)  with  Equation  209  we  return  to  the 
relationship 


(211) 


V{zi)  =  L{zi)U{zi) 

^  '£r=0VrZi  =  L{Zi)'£^^oUgZf. 


With  T  +  1  values  of  designated,  where  T  >  R,  Equation  211  can  be  used  in 
a  system  of  equations  for  estimating  the  coefficients  of  V{z).  Under  the  noiseless, 
Q  =  R  =  2,  model  assumption  we  can  create  the  overdetermined  system  of  linear 
equations  in  the  unknowns  vo,vi,...vr 

L{zo)  Sg=0 
L{zi)  Sg=o 

L{zt)  Z)g=0 
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Continuing  with  the  noiseless  model  of  L  analysis,  the  estimated  numerator 
and  denominator  polynomial  coefficients  can  be  used  with  to  find  the  maxima 
and  minima  of  L.  The  derivative  of  L  takes  the  form 

dL  _  d  I vq  +  viz  +  V2z'^\ 

dz  92  \  1  +  uiz  +  / 

_  {vi  +  2u22)(1  +  UiZ  +  U2z'^)  -  (uq  +  VjZ  +  V2z‘^){ui  +  2u2z) 

(1  +  UiZ  +  U2Z'^y 

,  _  (wi  -  UiVo)  +  2(u2  -  U2Vo)z  +  {uiV2  -  U2Vi)z'^ 

^  ^  ~  {l  +  mZ  +  U2Z^y 

We  know  If  =  0  when  the  numerator  of  |f  is  equal  to  zero.  Therefore,  we  attain 
the  maxima  and  minima  by  rooting  the  numerator  polynomial  of  |f .  The  real  root 
in  the  range  (0, 1)  that  eflfects  the  global  maximum  L  evaluation  is  considered  the 
maximum  likelihood  solution. 

Figures  27,  28,  and  29  serve  to  validate  our  divided  differences  based  estimation 
algorithm  and  model  assumptions  for  the  noiseless  case.  In  the  figures,  actual  L 
and  modeled  L  evaluations  are  plotted  over  the  range  of  2,  for  data  record  lengths 
of  M  =  10,  25  and  50  respectively.  The  noiseless  signal  was  created  with  the  same 
parameters  used  at  the  beginning  of  this  chapter.  A  fourth  order,  K  =  A,  divided 
difference,  with  five  equally  spaced  values  of  Zi  between  (0, 1),  was  used  in  an  exactly 
determined  system  of  equations  for  estimating  the  coefficients  of  the  two  degree 
denominator  polynomial,  U{z).  Because  two  equations  were  necessary  for  the  two 
unknown  coefficients  (uq  was  assumed  normalized  to  1),  s  values  of  0  and  1  were 
sufficient.  Therefore,  K  =  A>  R-\-  s  satisfies  the  minimum  divided  difference  order 
for  both  equations.  One  hundred  values  of  2i,  equally  spaced  on  the  interval  (0, 1), 
were  designated  to  estimate  the  coefficients  of  the  two  degree  numerator  polynomial, 
V(z).  This  resulted  in  100  equations  for  the  three  unknowns  in  the  overdetermined 
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Figure  27.  Evaluation  of  the  actual  L  and  the  noiseless  modeled  L  over  the  range 
of  2  for  noiseless  data  with  a  record  length  of  10.  The  signal  exponential 
is  A  =  .8  and  the  maximum  L  modeled  occurs  at  z  =  .8125. 


Figure  28.  Evaluation  of  the  actual  L  and  the  noiseless  modeled  L  over  the  range 
of  z  for  noiseless  data  with  a  record  length  of  25.  The  signal  exponential 
is  A  =  .8  and  the  maximum  L  modeled  occurs  at  2  =  .8005. 
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Figure  29.  Evaluation  of  the  actual  L  and  the  noiseless  modeled  L  over  the  range 
of  z  for  noiseless  data  with  a  record  length  of  50.  The  signal  exponential 
is  A  =  .8  and  the  maximum  L  modeled  occurs  at  z  =  .7995. 


linear  system.  After  the  coefficients  were  estimated,  maximizing  z  values  of  .8124, 
.8005,  and  .7995,  for  M  =  10,  25,  and  50  respectively,  were  calculated  from  the 
maximizing  roots  of  the  numerator  polynomial  of 

From  the  figures  and  maximizing  values  of  z,  we  see  that  the  algorithm  and 
model  assumptions  are  accurate  and  valid  for  the  noiseless  case.  As  expected,  in¬ 
creasing  the  data  record  length  from  10  to  25  increased  the  estimator  accuracy,  but 
increasing  the  data  record  length  from  25  to  50  had  no  appreciable  affect,  because  at 
those  values  of  M  the  low  degree  model  assumption  is  appropriate.  We  also  noted 
that  increasing  the  divided  difference  order,  K,  for  the  U{z)  coefficient  estimates 
was  unnecessary  and  in  fact  reduced  estimator  accuracy.  Additionally,  varying  the 
number  of  equations  for  the  ^(z)  coefficient  estimates  from  50  to  200  had  little  effect 
on  the  solution. 
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Figure  30.  Evaluations  of  the  actual  L  and  the  modeled  L  over  the  range  of  z  for 
noisy  data  with  a  record  length  of  25.  The  signal  exponential  is  A  =  .8 
and  the  SNR  is  3  dB.  The  rational  function  model  consisted  of  a  two 
degree  polynomial  over  a  two  degree  polynomial. 


The  divided  differences  maximum  likelihood  estimator  just  developed  is  not 
predicated  on  a  unimodal  cost  function.  When  the  actual  cost  function  is  multi¬ 
modal  over  the  range  of  z — as  is  the  case  in  low  SNR  scenarios — a  rational  function 
will  model  multi-modality.  Disregarding  the  inadequacies  of  a  two  degree  polynomial 
over  a  two  degree  polynomial  rational  function  for  modeling  the  cost  function  of  a 
noisy  signal,  the  noiseless  divided  difference  algorithm  just  created  was  utilized  on 
the  same  signal  with  additive  noise  at  3  dB  SNR.  The  data  record  length  was  25. 
Figure  30  illustrates  the  results.  The  modeled  L  is  indeed  multi-modal,  but  its 
maximum  is  not  in  the  vicinity  of  the  actual  maximum  of  L. 


When  we  recall  the  noisy  model  of  L, 


L{z',c,X,M)  « 


(214) 


1  -  2A2  +  A2z2 

2c(l  -Xz-z'^  +  Az^)  w[m]z^ 
l-2Xz  +  X^z'^ 

(1  —  2Az  +  (A^  —  l)z^  +  2A2:^  —  X^z'^)  (Em=o 
1  -  2A2  +  A2z2 


we  note  a  common  two  degree  denominator  polynomial  exists,  but  the  second  and 
third  rational  function  numerators  are  calling  for  M+2  and  2M+2  degree  polynomi¬ 
als  respectively.  Therefore,  a  divided  difference  algorithm  for  noisy  signals  requires  a 
higher  degree  numerator  polynomial  in  its  noisy  rational  function  model.  Increasing 
the  degree  to  2M-I-2  is  not  acceptable  because  of  the  increased  numerical  complexity. 

We  observed  that  the  first  few  terms  of 

M-l 

(215)  Y. 

m—O 

are  the  most  influential  because  of  the  decreasing  nature  of  z'^  as  m  increases,  and 
used  this  rational  to  justify  a  smaller  degree  increase  in  the  numerator  polynomial. 
The  coefficients  for  at  least  a  four  degree  numerator  polynomial  are  clearly  supported 
in  the  third  rational  function  of  Equation  214.  Thus,  we  assumed  a  four  degree 
polynomial  over  a  two  degree  polynomial  rational  function  would  be  sufficient  for 
modeling  the  noisy  L. 

The  previous  divided  difference  algorithm  is  easily  generalized  for  accommodat¬ 
ing  a  four  degree  numerator  polynomial.  A  divided  difference  order,  K  =  Q  >  R  +  s, 
was  utilized  to  estimate  the  two  unknown  coefficients  of  the  denominator  polyno- 
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Figure  31.  Realization  1  evaluations  of  the  actual  L  and  the  modeled  L  over  the 
range  of  2:  for  noisy  data  with  a  record  length  of  25.  The  signal  expo¬ 
nential  is  A  =  .8  and  the  SNR  is  3  dB.  The  rational  function  model 
consisted  of  a  four  degree  polynomial  over  a  two  degree  polynomial. 


mial,  U{z),  and  200  Zi  values  between  (0, 1)  were  designated  for  estimating  the  five 
unknown  coefficients  of  the  numerator  polynomial,  V{z). 

We  see  from  Figures  31  and  32  that  the  divided  difference  algorithm,  with  a  four 
degree  numerator  polynomial  over  a  two  degree  denominator  polynomial,  provides 
a  modeled  L  similar  to  the  actual  L  of  a  noisy  25  data  point  signal  at  3  dB  SNR. 
Note  that  for  two  different  realizations,  neither  the  maximum  of  the  modeled  L  or 
the  actual  L  resides  at  the  correct  2:  estimate  of  .8.  This  variance  can  be  anticipated 
when  recalling  from  CRB  theory  that  the  minimum  variance  of  all  estimators  must 
increases  as  SNR  increases. 

To  accurately  access  the  performance  of  the  noisy  divided  difference  algorithm, 
the  same  Monte-Carlo  experiment,  as  performed  on  the  golden  section  search  and 
IGLS  algorithms,  was  conducted.  For  a  quick  review  of  the  experiment’s  parame- 
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Figure  32.  Realization  2  evaluations  of  the  actual  L  and  the  modeled  L  over  the 
range  of  2  for  noisy  data  with  a  record  length  of  25.  The  signal  expo¬ 
nential  is  A  =  .8  and  the  SNR  is  3  dB.  The  rational  function  model 
consisted  of  a  four  degree  polynomial  over  a  two  degree  polynomial. 
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Figure  33.  The  inverse  MSE  of  the  divided  difference,  golden  section,  and  IGLS 
algorithms  versus  the  CRB  over  a  range  of  SNRs.  The  underlying  signal 
is  of  one  real  exponential  with  A  =  .8 

ters,  400,  twenty-five  data  point  realizations  of  y  were  created  for  each  SNR.  The 
underlying  signal  possessed  a  A  of  .8  and  a  c  of  1. 

The  performance  of  all  three  estimators  is  illustrated  in  Figures  33  and  34. 
The  divided  difference  algorithm  performed  slightly  better  than  the  other  two 
algorithms  at  low  SNR.  The  improved  performance  is  attributed  to  its  multi-mode 
fitting  capability.  As  the  SNR  increased,  the  divided  difference  algorithm  performed 
slightly  worse  than  the  other  two  algorithms.  This  trend  continues  as  the  SNR 
increases  to  its  maximum  for  machine  precision. 

The  divided  difference  based  estimation  algorithm  is  best  suited  for  noise  cor¬ 
rupted  signals  with  low  SNRs.  At  moderate  to  high  SNRs,  the  algorithm  is  too 
complex  and  the  modeling  is  excessive.  However,  further  experimentation  with  the 
algorithm  itself  is  recommended.  Investigation  for  appropriate  divided  difference 
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Figure  34.  The  bias  of  the  divided  difference,  golden  section,  and  IGLS  algorithms 
over  a  range  of  SNRs.  The  underlying  signal  is  of  one  real  exponential 
with  A  =  .8 


orders,  K,  and  system  of  equations  sizes  will  provide  additional  insight  into  the 
problem.  Additionally,  simplified  cost  functions  for  two  and  three  mode,  complex 
and  real  applications  may  exist. 

In  conclusion  of  the  one  real  exponential  general  analysis,  the  divided  difference 
algorithm’s  minimal  improvement  over  the  golden  section  search,  at  low  noise,  re¬ 
moves  our  concerns  for  estimator  degradation  due  to  multi-modality.  If  the  proposed 
problem  can  be  restricted  to  that  of  a  single  real  exponential  in  noise,  we  recommend 
the  golden  section  search  because  of  its  consistently  good  performance  over  all  SNRs. 
However,  if  the  SNR  of  the  signal  is  not  very  low,  the  IGLS  algorithm’s  performance 
is  definitely  satisfactory. 
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5.3  Two  Real  Exponentials 

The  next  phase  of  our  real  exponential  parameter  estimation  analysis  includes 
adding  a  second  exponential  for  an  in  depth  look  at  the  summed  exponential  problem. 
The  two  real  exponential  problem  is  especially  enlightening  because  minimization  of 
the  classical  maximum  likelihood  cost  function  can  be  visualized  on  a  two  dimensional 
contour  plot.  Consider  the  25  data  point  observation,  y,  with  underlying  parameters 
Cl  =  .5,  C2  =  1,  Ai  =  .8,  and  A2  =  .5  and  the  maximum  likelihood  cost  function 

(216)  L{zuZ2)  =  f{l  -  ViV^V)-^V^)y 
where 

V  =  ii(zi)  v(z2) 

(217)  S(z„)  =  [1  z.  zl  ... 

and  zi  and  Z2  represent  the  unknown  exponential  values  to  be  estimated.  Figure  35  is 
a  contour  plot  of  the  surface  generated  by  the  cost  function  evaluated  over  the  range 
of  zi  and  Z2  at  an  SNR  of  30  dB.  The  two  plus  signs  identify  the  location  of  two 
desired  solutions  and  help  illustrate  the  symmetry  of  the  solution  space.  The  intense 
relief  on  the  diagonal  is  brought  about  by  the  singularity  that  occurs  in 
when  zi  fn  Z2.  Additionally,  we  observed  that  the  general  shape  of  the  maximum 
likelihood  contour  surface  is  relatively  invariant  to  changes  in  SNR.  Like  in  the  one 
mode  analysis,  additional  multi-modal  curves  do  not  appear  until  SNR  low. 

A  two  mode  nonlinear  search  of  the  cost  function  for  the  maximum  likelihood 
solution  appears  to  be  achievable,  but  before  embarking  on  that  path,  consider  the 
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Figure  35.  Contour  plot  of  the  mciximum  likelihood  cost  function  evaluated  over 
the  range  of  zi  and  Z2  with  underlying  parameters  ci  =  .5,  C2  =  1, 
Ai  =  .8,  and  A2  =  .5  at  an  SNR  of  30  dB. 
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Figure  36.  Contour  plot  of  the  maximum  likelihood  cost  function  evaluated  over 
the  range  of  bi  and  62  with  underlying  parameters  ci  =  .5,  C2  =  1, 
Ai  =  .8,  and  A2  =  .5  at  an  SNR  of  30  dB. 

cost  function  after  being  transformed  by  the  relationship 
(218)  (I  -  V{V^V)-^V'^)  = 

Figure  36  is  a  contour  plot  of  the  30  dB,  two  mode,  linear  prediction  coefficient  based, 
maximum  likelihood  surface.  A  plus  sign  identifies  the  desired  solution.  To  facilitate 
a  one-to-one  mapping  between  Zn  and  6„,  bo  was  assumed  to  be  one.  Immediately, 
we  see  that  there  is  no  symmetry  in  the  linear  prediction  coefficient  based  contour 
plot,  and  that  the  curved  valley,  enclosing  the  maximum  likelihood  solution  in  the 
previous  plot,  is  now  represented  as  a  straight  valley. 

A  closer  review  of  the  transformation  on  the  parameters  zi  and  Z2  limits  the 
feasibility  region  of  the  linear  prediction  coefficient  based  contour  plot  as  well.  Be¬ 
cause  zi  and  Z2  are  only  considered  in  the  interval  (0, 1),  and  because  they  are  the 
quadratic  roots  of  the  polynomial  formed  by  the  linear  prediction  coefficients,  we 
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know 


(219) 


k  =  -(zi  +  Z2) 


and 


(220) 


62  =  ^1-22- 


Therefore, 


(221)  -2  <  61  <  0 

and 

(222)  0  <  62  <  1. 


Additionally,  from  the  quadratic  equation 


(223) 


— 


-61  ±  y/bj  -  462 

2 


we  know  that  a  real  mandates 


(224) 


Therefore, 


(225) 


62  <  -fc?. 
4 
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Finally,  since 


(226)  462  >  Q, 

and  z  is  confined  to  the  interval  (0, 1),  we  know 


— 61  + — 4&2 

2 

< 

1 

\Jb\  —  462 

< 

2  -f-  61 

hi  -  462 

< 

4  +  4bi  -1-  bl 

—62 

< 

1  +  bi 

b2 

> 

1 

1 

The  boundaries  of  the  inequalities  just  identified  are  illustrated  in  Figure  37. 
The  label  A  is  assigned  to  the  62  >  0  boundary,  the  label  B  is  assigned  to  the 
62  >  —  &i  —  1  boundary,  and  the  label  C  is  assigned  to  the  62  <  boundary.  The 
boundaries  are  not  a  function  of  SNR,  and  their  relationship  to  the  30  dB  SNR 
maximum  likelihood  contour  plot  is  identified  in  Figure  38.  Interestingly,  the  curved 
boundary  C  equates  to  the  case  of  Zi  =  Z2,  but  in  the  linear  prediction  coefficient 
based  representation,  the  cost  function  is  not  subject  to  the  same  singularities  as 
previously  noted  in  the  zi  versus  Z2  plot. 

When  considering  a  search  plan  for  the  two  mode  maximum  likelihood  so¬ 
lution,  the  straight  valley — prevalent  in  all  two  mode,  linear  prediction  coefficient 
based,  contour  plots — attracts  particular  attention.  The  valley  consistently  runs 
through  the  feasibility  region,  and  if  the  two  mode  problem  could  be  reduced  to 
a  one  dimensional  line  search  along  the  valley,  considerable  efficiency  would  be  at¬ 
tained. 
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Figure  38.  Contour  plot  of  the  maximum  likelihood  cost  function  evaluated  over 
the  range  of  bi  and  62  at  an  SNR  of  30  dB.  The  boundaries  of  the 
feasibility  region  are  also  identified  with  solid  lines. 
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Initially,  a  simple  algorithm  was  devised  to  search  the  line,  drawn  between 
the  minimums  of  the  A  and  B  borders,  for  a  minimal  solution.  The  behavior  of 
the  maximum  likelihood  cost  function  along  the  A  and  B,  and  even  C,  borders  was 
similar  to  that  of  the  previous  one  mode  section:  L  evaluated  unimodally,  over  the 
range  of  the  border,  under  all  but  very  low  SNR  scenarios.  A  golden  section  search 
was  utilized  to  determine  the  minimums  of  the  A  and  B  borders  and  thus  estimate 
the  endpoints  of  a  line  in  the  valley.  A  third  golden  section  search  was  utilized  to 
identify  the  minimum  along  the  line  of  the  valley.  The  algorithm  always  provided 
reasonable  estimates,  but  as  SNR  decreased,  the  line  of  the  valley  appeared  to  only 
graze  the  true  minimum.  This  resulted  in  estimates  slightly,  but  consistently  inferior 
to  the  IGLS  method  of  searching  the  maximum  likelihood  surface. 

To  identify  the  reason  for  this  inferior  performance,  the  linear  valley  assumption 
was  reconsidered.  The  behavior  of  the  valley  in  general  is  explained  by  the  summed 
exponential’s  relationship  to  the  linear  difference,  or  linear  prediction,  equation. 
Recall  that  the  underlying,  two  mode,  noiseless  signal  is  a  solution  to 

(227)  5[m]  +  bis[m  —  1]  +  b2s[m  —  2]  =  0 


Again  bo  is  assumed  to  be  normalized  to  1.  Equation  227  can  be  rearranged  to  take 
the  form  of  a  line  expressed  in  the  unknowns  bi  and  62, 


(228) 


— s[m  — 1]^  5[m] 

s[m  —  2]  ^  s  m  —  2] 


The  slope  and  intercept  of  the  line  varies  as  5[m]  varies  from  m  —  0,1,...  ,M  —  1, 
but  all  lines  defined  by  Equation  228  intersect  at  the  desired  solution  for  bi  and  62- 
In  figure  39  the  lines  defined  by  Equation  228  are  incorporated  with  the  feasibility 
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Figure  39.  Contour  plot  of  the  mciximum  likelihood  cost  function  evaluated  over  the 
range  of  bi  and  62  at  an  SNR  of  30  dB.  The  boundaries  of  the  feasibility 
region  and  the  linear  prediction  solution  lines  are  also  identified  with 
solid  lines. 

region  and  the  contour  plots  for  the  30  dB,  two  mode,  problem.  Only  the  upper  and 
lower  lines  are  displayed  for  clarity.  From  the  figure,  the  relationship  between  the 
intersecting  lines  and  the  valley  of  the  contour  plot  is  illustrated. 

Although  this  relationship  has  been  identified,  the  lines  defined  by  Equa¬ 
tion  228  consist  of  entirely  unknown  information  in  an  estimation  context.  In  theory, 
a  line  in  the  valley  could  be  used  for  a  one  dimensional  search  of  the  minimum.  In 
practice,  accurately  estimating  the  parameters  of  such  a  line  requires  the  same,  if 
not  more,  effort  as  the  original  plan  of  searching  for  the  two  dimensional  minimum. 

With  the  one  dimensional  search  option  dismissed,  attention  was  given  to  im¬ 
proving  the  two  dimensional  search.  The  linear  prediction  coefficient  based  graphical 
representation  just  created  for  the  real,  two  mode,  problem  provides  considerable  in¬ 
sight  into  the  mechanics  of  the  IGLS  algorithm.  As  discussed  in  Chapter  IV,  initial¬ 
izing  the  IGLS  algorithm  with  a  least  squares  estimate  of  b  is  equivalent  to  building 
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Figure  40.  Location  of  6i  and  62  for  each  iteration  of  the  IGLS  algorithm  plotted 
with  respect  to  the  feasibility  region  for  the  40  dB  SNR,  two  mode,  real 
exponential  problem. 

the  B  matrix,  for  an  IGLS  iteration,  with  60  =  1  and  fei,  62,  •  •  •  ,  =  0.  In  the  two 

mode  real  exponential  problem,  the  point  61  =  62  =  0  resides  at  the  A  and  C  border 
intersection  of  the  feasibility  region.  In  Figure  40,  hi  and  62  estimates  from  each 
iteration  of  the  IGLS  algorithm  for  the  40  dB  SNR  problem  are  identified.  We  see 
that  after  initializing  at  bi  =  b2  =  0,  the  first  iteration  estimates  bi  and  62  in  the 
valley  and  near  the  center  of  the  feasibility  region.  In  subsequent  estimates,  61  and 
62  track  up  the  valley  to  the  desired  solution. 

When  each  iteration  of  the  IGLS  algorithm  is  observed  at  lower  SNRs,  subse¬ 
quent  estimates  of  bi  and  62  occasionally  occur  outside  of  the  feasibility  region.  In 
Figure  41  we  see  the  first  update  of  b\  and  62  is  in  the  valley  but  well  below  the 
feasibility  region.  For  this  realization  and  SNR,  subsequent  estimates  of  61  and  62 
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Figure  42.  Locations  of  bi  and  62  for  each  iteration  of  the  IGLS  algorithm  plotted 
with  respect  to  the  feasibility  region  for  the  20  dB  SNR,  two  mode,  real 
exponential  problem. 

track  up  the  valley  to  the  desired  solution.  However,  returning  to  the  feasibility 
region  in  subsequent  updates  is  not  always  the  case.  In  Figure  42  we  see,  for  a 
realization  at  20  dB,  the  first  update  occurs  outside  of  the  feasibility  region,  and 
the  subsequent  updates  track  further  away  from  the  region.  To  avoid  this  malady, 
alternative  initialization  points  for  the  IGLS  algorithm  were  considered. 

In  the  previous  work,  we  efficiently  found  the  location  of  the  minimums  on 
the  borders  of  the  feasibility  region  with  a  golden  section  search.  The  A  and  B 
border  minimums  typically  occur  in  the  valley  of  the  linear  prediction  coefficient 
based  contour  plots,  even  at  low  SNR.  Similarly,  for  the  same  realizations,  the  C 
border  minimum  typically  occurs  very  near  the  desired  solution.  In  Figure  43,  the 
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Figure  43.  Minimums  identified  on  the  A,  B,  and  C  borders  of  the  feasibility  region 
for  the  real,  two  mode,  problem  at  5  dB  SNR. 

border  minimums  are  identified  for  a  realization  of  the  standard  two  mode  problem 
at  5  dB  SNR. 

With  this  information,  a  Monte-Carlo  experiment  was  conducted  to  asses  the 
affect  of  initializing  the  IGLS  algorithm  from  the  A,  B,  and  C  border  minimums. 
The  results  of  the  experiment  are  displayed,  in  terms  of  inverse  MSE  and  bias,  in 
Figures  44,  45,  46,  and  47.  The  performance  of  the  IGLS  algorithm,  initialized 
from  the  least  squares  estimate  (6i  =  63  =  0),  versus  the  IGLS  algorithm,  initialized 
from  each  border  minimum,  is  illustrated  in  each  plot.  In  every  case,  15  iterations 
of  the  IGLS  algorithm  were  implemented.  From  the  figures,  we  conclude  that  very 
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Figure  44.  The  Ai  inverse  MSE  of  the  standard  IGLS  algorithm,  initialized  from 
6i  =  62  =  0,  and  the  IGLS  algorithm,  initialized  from  the  minimums  of 
the  A,  B,  and  C  borders.  Also  plotted  is  the  CRB. 


SNR  dB 

Figure  45.  The  Ai  bias  of  the  standard  IGLS  algorithm,  initialized  from  bi  =  b2  —  0, 
and  the  IGLS  algorithm,  initialized  from  the  minimums  of  the  A,  B,  and 
C  borders,  over  a  range  of  SNRs. 


Figure  46.  The  A2  inverse  MSE  of  the  standard  IGLS  algorithm,  initialized  from 
61  =  62  =  0,  and  the  IGLS  algorithm,  initialized  from  the  minimums  of 
the  A,  B,  and  C  borders.  Also  plotted  is  the  CRB. 


Figure  47.  The  A2  bias  of  the  standard  IGLS  algorithm,  initialized  from  61  =  62  =  0, 
and  the  IGLS  algorithm,  initialized  from  the  minimums  of  the  A,  B,  and 
C  borders,  over  a  range  of  SNRs. 
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Figure  48.  Realization  1  locations  of  6i  and  62  for  each  iteration  of  the  IGLS  algo¬ 
rithm,  initialized  at  two  different  points.  One  starting  point  was  at  the 
least  squares  solution  and  the  other  starting  point  was  at  the  minimum 
of  the  C  border.  The  SNR  was  20  dB. 

little  is  gained  by  initializing  the  IGLS  algorithm  at  a  point  nearer  to  the  desired 
solution. 

We  postulate  that  given  enough  iterations,  the  IGLS  algorithm  will  terminate 
at  the  same  estimate  regardless  of  being  initialized  at  the  least  squares  solution 
or  a  border  minimum.  This  statement  is  corroborated  by  observing  the  bi  and  62 
estimates  from  each  IGLS  iteration  on  a  realization  that  terminates  outside  of  the 
feasibility  region.  In  Figure  48  the  61  and  62  estimates  from  each  iteration  of  the 
IGLS  algorithm  initialized  at  the  C  border  minimum  are  overlayed  on  estimates  from 
the  IGLS  algorithm  initialized  at  the  least  squares  solution.  The  same  realization. 
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Figure  49.  Realization  2  locations  of  6i  and  62  for  each  iteration  of  the  IGLS  algo¬ 
rithm,  initialized  at  two  different  points.  One  starting  point  was  at  the 
least  squares  solution  and  the  other  starting  point  was  at  the  minimum 
of  the  C  border.  The  SNR  was  20  dB. 

at  20  dB  SNR,  used  in  Figure  42  was  used  for  Figure  48.  Both  applications  of  the 
IGLS  algorithm  terminated  at  the  same  location  outside  of  the  feasibility  region. 

The  same  phenomena  for  a  different  realization  at  20  dB  SNR  is  demonstrated 
in  Figure  49.  In  this  case,  the  successive  estimates  of  61  and  62  from  the  IGLS 
algorithm,  initialized  at  6^  =  62  =  0,  track  towards  the  feasibility  region  but  fall  short 
of  the  A  border.  The  successive  estimates  of  61  and  62  from  the  IGLS  algorithm, 
initialized  at  the  C  border,  track  below  the  A  border  towards  the  same  point  outside 
of  the  feasibility  region  as  well. 

From  the  Monte-Carlo  experiment  and  these  sample  realizations,  we  conclude 
that  the  IGLS  algorithm  is  a  robust  estimator  for  the  global  minimum  of  the  max- 
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imum  likelihood  surface.  The  algorithm  is  essentially  unaffected  by  alternative  ini¬ 
tializations  on  the  feasibility  region  border  and  by  departures  from  the  feasibility 
region  for  subsequent  iterations. 

A  final  Monte-Carlo  experiment  was  conducted  to  investigate  the  relationship 
between  the  global  minimum  of  the  maximum  likelihood  surface  and  the  feasibility 
region.  With  a  well  defined  feasibility  region  and  a  readily  available  maximum  likeli¬ 
hood  cost  function — as  is  the  case  for  the  real,  two  mode  problem — implementation  of 
a  constrained  optimization  routine  is  straight  forward.  For  this  reason,  the  MATLAB 
constr  function  was  utilized  on  the  same  realizations  of  y  created  for  the  previ¬ 
ous  Monte-Carlo  experiment.  The  MATLAB  constr  function  employs  a  sequential 
quadratic  programming  (SQP)  method  which  can  be  assisted  with  closed  form  ex¬ 
pressions  for  the  first  derivatives  of  the  cost  function  and  region  constraints  (20:3 — 
9,3 — 12).  The  derivative  of  the  cost  function  and  region  constraints  are  developed 
in  Appendix  D.  Similar  to  the  alternative  initializations  of  the  IGLS  algorithm,  the 
SQP  algorithm  was  initialized  from  the  minimums  of  the  A,  B,  and  C  borders. 

The  performance  of  the  SQP  algorithm,  in  terms  of  inverse  MSB  and  bias  is 
compared  to  the  IGLS  algorithm,  initialized  at  the  least  squares  solution,  in  Fig¬ 
ures  50,  51,  52,  and  53.  From  the  inverse  MSB  plots,  it  appears  that  the  SQP 
algorithm  performs  slightly  better  than  the  IGLS  algorithm  at  lower  SNR,  but  the 
behavior  of  the  bias  at  the  same  SNR  makes  the  estimates  suspect.  Also  note,  the 
alternative  initializations  have  minimal  effect  on  the  SQP  algorithm.  Regardless, 
any  improved  performance  is  minor  when  realizing  that  the  IGLS  algorithm  is  not 
restricted  to  the  real,  two  mode,  problem.  Additionally,  the  IGLS  algorithm  is  com¬ 
putationally  more  efficient  than  the  SQP  algorithm. 
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Figure  50.  The  Ai  inverse  MSE  of  the  standard  IGLS  algorithm,  initialized  from 
bi  =  b2  =  0,  and  the  SQP  algorithm,  initialized  from  the  minimums  of 
the  A,  B,  and  C  borders.  Also  plotted  is  the  CRB. 


SNR  dB 


Figure  51.  The  Ai  bias  of  the  standard  IGLS  algorithm,  initialized  from  bi  =  b2  =  0, 
and  the  SQP  algorithm,  initialized  from  the  minimums  of  the  A,  B,  and 
C  borders. 
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Figure  52.  The  A2  inverse  MSE  of  the  standard  IGLS  algorithm,  initialized  from 
bi  =  b2  =  0,  and  the  SQP  algorithm,  initialized  from  the  minimums  of 
the  A,  B,  and  C  borders.  Also  plotted  is  the  CRB. 
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Figure  53.  The  A2  bias  of  the  standard  IGLS  algorithm,  initialized  from  61  =  62  =  0, 
and  the  SQP  algorithm,  initialized  from  the  minimums  of  the  A,  B,  and 
C  borders. 


5-4  Chapter  Conclusion 

Beyond  the  real,  two  mode,  problem,  visualization  of  the  mechanics  of  the 
IGLS  algorithm  is  difficult.  As  dimensionality  increases,  estimation  accuracy  de¬ 
creases,  but  given  the  performance  of  the  IGLS  algorithm  for  the  one  and  two  mode 
problem,  we  find  no  reason  to  search  for  a  better  multi-dimensional  maximum  like¬ 
lihood  algorithm.  We  have  shown,  in  isolated  scenarios  at  low  SNR,  specialized 
algorithms  can  be  developed  to  estimate  parameters  slightly  better  than  the  IGLS 
algorithm,  but  ultimately,  these  efforts  serve  better  to  exalt  the  robust  capabilities 
of  the  IGLS  estimator.  We  move  on  to  the  application  portion  of  the  dissertation 
with  confidence  in  the  IGLS  algorithm. 
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VI.  Deep  Level  Transient  Spectroscopy  Application 

6. 1  Chapter  Introduction 

As  alluded  to  throughout  this  dissertation,  a  particular  physics  experiment  that 
motivates  our  real  exponential  analysis  is  deep  level  transient  spectroscopy  (DLTS). 
DLTS  is  a  capacitance  transient  thermal  scanning  technique  used  to  characterize 
defects  present  in  semiconductors  (45:3023).  We  begin  this  chapter  by  developing  the 
fundamentals  of  DLTS.  Ultimately,  we  show  that  the  signals  from  a  DLTS  experiment 
are  well  modeled  by  superimposed,  real,  decaying  exponentials.  Each  exponential, 
and  corresponding  amplitude  coefficient,  characterize  a  defect.  Typically  the  number 
of  detectable  defects,  N,  is  small. 

Previous  approaches  to  estimating  the  parameters  of  DLTS  signals  include  box¬ 
car  integrator  analysis  (45),  modulating  function  waveform  analysis  (58),  and  linear 
prediction  (LP)  analysis  (65).  The  boxcar  integrator  approach  is  the  original  tech¬ 
nique  proposed  by  Lang — the  founder  of  DLTS — in  1974.  It  is  an  analog  approach 
and  is  still  popular  today.  In  the  DLTS  fundamentals  section,  we  show  why  it  is 
limited.  The  modulation  function  approach,  like  linear  prediction,  is  digital,  but 
unlike  linear  prediction,  is  heuristic.  Recently,  Doolittle  and  Rohatgi  published  a 
series  of  papers  highlighting  the  benefits  of  linear  prediction  analysis  over  modulat¬ 
ing  function  analysis  (10,  11,  12).  The  LP  approach  they  use  for  DLTS  analysis  is 
taken  from  the  work  of  Shapiro  et  al.  published  in  1984.  In  our  research,  we  have 
developed  concepts,  unavailable  to  Shapiro  et  al,  that  significantly  improve  on  the 
LP  approach  to  DLTS  analysis.  This  work  has  been  accepted  for  publication  by  the 
DLTS  community  (31).  The  increased  fidelity  in  our  parameter  estimates  has  lead 
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to  defect  characterizations  that  were  previously  unobserved.  These  too  have  been 
submitted  for  publication  (62). 

After  developing  the  fundamentals  of  DLTS,  we  review  the  LP  approach  of 
Shapiro  et  al.  As  we  review  the  previous  LP  approach  to  DLTS  analysis,  we  introduce 
our  improvements.  We  then  verify  our  contribution  on  actual  DLTS  signals.  At  the 
end  of  the  Chapter,  we  include  additional  algorithms,  developed  by  us,  for  DLTS. 

6.2  DLTS  Fundamentals 

In  this  section,  we  identify  the  DLTS  signal  and  its  relationship  to  the  su¬ 
perimposed  exponential  model.  We  begin  by  developing  the  semiconductor  and  pn 
junction  theory  necessary  for  explaining  DLTS.  Volumes  1  and  2  of  the  series  by 
Pierret  and  Neudeck  (55,  53),  and  the  book  by  Sze  (69)  are  our  primary  references. 
The  actual  DLTS  signal  development  follows  the  methodology  of  Lang  (45,  46)  and 
Elsaesser  (14).  Elsaesser  clearly  extends  DLTS  to  the  digital  signal  processing  (DSP) 
environment. 

Consider  the  n-type  and  p-type  semiconductor  materials  of  Figure  54a  prior  to 
contact.  Both  n-type  and  p-type  materials  are  crystal  lattices  primarily  composed 
of  atoms  with  four  valence  electrons  such  as  silicon.  Si.  The  lattice  in  the  n-type 
material  is  doped  with  a  small  quantity  of  atoms  with  five  valence  electrons  such 
as  arsenic.  As.  The  lattice  in  the  p-type  material  is  doped  with  a  small  quantity 
of  atoms  with  three  valence  electrons  such  as  boron,  B.  The  doped  atoms  of  both 
n-type  and  p-type  materials  are  called  impurities  even  though  their  introduction  into 
the  crystal  lattice  is  intentional.  The  four  valence  electrons  of  the  primary  atoms  in 
the  lattice  tend  to  share  electrons  with  the  other  atoms  to  form  covalent  bonds.  At 
room  temperature,  the  impurities  described  are  more  likely  to  form  covalent  bonds 


125 


(a) 


P 


n 


Junction 

E 


V  it 

\  , 

\  / 

Depletion 

Region 


Figure  54.  Doped  n-type  and  p-type  semiconductor  materials  (a)  before  and  (b) 
after  forming  a  'pn  junction. 


with  the  primary  atoms  of  the  lattice  than  keep  their  neutral  states.  Therefore, 
the  four  valence  electrons  of  the  impurity  atoms  in  n-type  material  form  covalent 
bonds  with  the  primary  atoms  of  the  lattice,  and  the  fifth  electrons  are  “donated” 
to  the  conduction  band.  The  impurity  atoms  in  this  case  are  called  donors,  and 
the  material  is  called  n-type  because  of  the  addition  of  negatively  charged  carriers. 
With  p-type  material,  the  exact  opposite  occurs.  The  impurity  atoms,  with  three 
valence  electrons,  “accept”  electrons  from  the  primary  atoms  of  the  lattice  to  form 
covalent  bonds.  Consequently,  positively  charged  “holes”  are  created  in  the  valence 
band.  The  impurity  atoms  in  this  case  are  called  acceptors,  and  the  material  is 
called  p-type  because  of  the  addition  of  positively  charged  carriers.  The  covalent 
bonding  and  resulting  charged  carriers  are  illustrated  for  n-type  and  p-type  silicon 
in  Figure  55. 

The  impurities  described  thus  far  ionize,  or  emit  electrons  or  holes,  at  room 
temperatures.  Their  ionization  energies  are  just  inside  the  energy  band  gap  bound¬ 
aries.  The  ionization  energy  of  donors  in  n-type  material  is  just  below  the  conduction 
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Figure  55.  Covalent  bonding  and  resulting  charged  carriers  for  (a)  n-type  silicon 
and  (b)  p-type  silicon.  The  figure  was  obtained  from  Sze  (69). 
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band  edge,  and  the  ionization  energy  of  acceptors  in  p-type  material  is  just  above 
the  valence  band  edge.  Consequently,  we  refer  to  these  intentional  impurities  as 
shallow  donors  and  shallow  acceptors,  respectively.  Impurities  with  more  than  five 
or  less  than  three  valence  electrons  can  be  unintentionally  introduced  into  n-type  or 
p-type  material  as  well.  Their  ionization  energies  are  typically  deep  into  the  energy 
band  gap,  and  their  emission  of  electrons  or  holes  is  significantly  more  complicated 
than  that  of  shallow  donors  and  acceptors.  Without  knowing  more  about  the  deep 
level  impurity,  it  must  be  assumed  that  each  impurity  has  the  potential  to  capture, 
as  well  as  emit,  both  electrons  and  holes.  Defects  in  the  crystal  structure,  alone  or 
in  combination  with  deep  level  impurities,  also  are  assumed  to  have  the  potential 
to  capture  and  emit  both  electrons  and  holes.  In  general,  we  refer  to  deep  level 
impurities  and  crystal  structure  defects  as  deep  level  defects.  DLTS  is  a  technique 
for  identifying  and  characterizing  theses  deep  level  defects. 

Before  explaining  DLTS,  we  need  to  develop  some  key  concepts  for  pn  junc¬ 
tions.  Consider  the  n-type  and  p-type  materials  joined  in  Figure  54b,  and  assume 
there  are  no  defects  in  either  materials.  Because  of  the  larger  concentration  of 
electrons  on  the  n-type  side  of  the  junction  versus  the  p-type  side  of  the  junction, 
electrons  diffuse  into  the  p-type  material.  Likewise,  holes  from  the  p-type  material 
diffuse  into  the  n-type  material.  Recall,  the  impurity  atoms  are  covalently  bonded 
into  the  lattice.  With  the  negatively  charged  carriers  leaving  the  n-type  material,  a 
positive  space  charge  forms  on  the  n-type  side  of  the  junction,  likewise,  a  negative 
space  charge  forms  on  the  p-type  side  of  the  junction.  This  space  charge  differential 
creates  an  electric  field,  E,  in  the  direction  from  positive  space  charge  to  negative 
space  charge.  See  Figure  54b.  The  electric  field  is  opposite  the  direction  of  diffusion 
for  both  the  negatively  and  positively  charged  carriers.  Therefore,  an  equilibrium  be¬ 
tween  diffusion  and  space  charge  differential  is  achieved.  At  equilibrium,  the  electric 
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field  is  significant.  It  sweeps  all  the  positively  and  negatively  free  charged  carriers 
out  of  a  region  near  the  junction.  Consequently,  this  region  is  dubbed  the  depletion 
region.  Again,  see  Figure  54b. 

The  width,  w,  of  the  depletion  region  is  a  function  of  the  concentration  of 
shallow  donors.  No,  and  acceptors,  Na,  doped  into  the  n-type  and  p-type  materials. 
It  is  also  a  function  of  the  built  in  voltage,  Vbi,  due  to  the  space  charge  differential. 
The  equation  for  the  width  is  (69:78) 


(229) 
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[  NaNd 

1 

2 


where  e  is  the  dielectric  constant  of  the  semiconductor  and  q  is  the  electronic  unit 
charge.  If  Na  >  No,  the  depletion  region  is  shifted  over  to  the  n-type  side  of  the 
pn  junction.  If  N a  »  Nd  the  portion  of  the  depletion  region  on  the  p-type  side  of 
the  junction  is  so  small  that  the  depletion  region  can  be  considered  to  exist  only  in 
the  n-type  material  (69:78).  The  junction  is  denoted  p+n,  and  Equation  229  can  be 
simplified  to 


(230) 


w  = 


See  Figure  56a.  In  our  development,  we  only  consider  p+n  junctions.  The  develop¬ 
ment  for  n+p  junctions  is  complementary  and  straight  forward.  We  will  see  that  one 
sided  junctions  assist  DLTS  by  isolating  the  location  of  deep  level  defects. 

By  observing  Equation  230  we  can  see  a  relationship  between  the  width  of  the 
depletion  region  and  the  voltage.  If  a  reverse  bias  voltage,  Vr,  is  added  to  the  built 
in  voltage,  Vbi,  the  width  of  the  depletion  region  will  increase.  See  Figure  56b.  The 
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Figure  56.  Doped  p+n  junction  with  a  variable  depletion  region  width  dependent 
on  (a)  built  in  voltage,  (b)  built  in  voltage  plus  reversed  biased  voltage, 
and  (c)  built  in  voltage  minus  forward  biased  voltage. 
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governing  equation  is  (69:80) 


(2e{VBi  +  VR)\^ 

\  qNn  )  • 

Similarly,  if  a  forward  bias  voltage,  Vp,  is  subtracted  from  the  built  in  voltage, 
the  width  of  the  depletion  region  will  decrease.  See  Figure  56c.  In  this  case,  the 
governing  equation  is  complicated  by  the  fact  that  the  forward  bias  voltage  forces 
charged  carriers  into  the  depletion  region.  The  actual  equation  is  unnecessary  for 
our  development  and,  for  clarity,  is  omitted. 

We  have  now  established  that  under  biased  voltage  conditions,  the  width  of 
the  depletion  region  can  be  varied.  Because  the  shallow  donors  are  easily  ionized 
at  room  temperature,  and  because  the  charged  carriers  are  quickly  swept  out  of  the 
depletion  region  by  the  electric  field,  changes  in  the  bias  voltage  result  in  changes  in 
the  width  of  the  depletion  region  with  very  little  temporal  lag.  This  is  not  the  case 
when  deep  level  defects  are  present.  In  DLTS,  deep  level  defects  can  be  characterized 
by  their  temporal  infiuence  on  the  width  of  the  depletion  region  after  changes  in  the 
bias  voltage  are  made. 

As  mentioned  earlier,  we  assume  each  deep  level  defect  has  the  potential  to  emit 
and  capture  electrons  and  holes.  The  propensity  of  these  four  events  is  governed  by 
the  conservation  of  energy.  Possible  energy  interactions  with  a  deep  level  defect  are 
the  absorption  or  emission  of  phonons  (thermal  processes),  the  absorption  or  emission 
of  photons  (optical  processes),  and  interactions  with  free  conduction  band  electrons 
or  valence  band  holes  (Auger  processes).  If  bias  voltages  on  the  p+n  semiconductor 
are  applied  in  the  dark — which  is  always  the  case  in  DLTS — optical  processes  can  be 
ignored.  Also,  under  forward  biased  conditions.  Auger-capture  processes  dominate, 
over  emission  processes.  If  the  width  of  the  depletion  region  is  made  small  enough 
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by  forward  biasing,  the  previously  depleted  region  is  flooded  with  free  electrons  and 
holes.  If  the  duration  of  this  even  is  sufficient,  we  can  assume  any  deep  level  defect  in 
the  n-type  material  has  captured  electrons  or  holes.  If  the  defect  primarily  captures 
electrons,  we  call  it  an  electron  trap,  and  if  the  defect  primarily  captures  holes,  we 
call  it  a  hole  trap.  If  after  forward  biasing,  the  width  of  the  depletion  region  is 
opened  and  held  at  its  built  in  voltage  width,  or  even  further  at  a  reversed  biased 
width,  thermal  processes  will  prompt  the  traps  in  the  depletion  region  to  emit  their 
captured  electrons  or  holes  with  an  emission  rate  that  can  be  used  to  characterize  the 
particular  defect.  If  temperature  is  held  constant  (isothermal)  during  emission,  the 
decrease  in  concentration  of  traps  with  captured  electron  or  holes,  N!^,  is  modeled 
with  (14:26) 

(232)  N^{T,t)  = 

where  r  is  the  electron  or  hole  emission  rate,  and  Nt  is  the  total  concentration  of  the 
trap — with  or  without  captured  electrons  or  holes.  The  temperature  dependence,  T, 
is  brought  in  to  Equation  232  through  the  emission  rate  and  is  discussed  in  detail 
shortly.  Finally,  because  the  electric  held  sweeps  the  free  charged  carriers  out  of  the 
depletion  region,  re-trapping  under  reversed  bias  conditions  does  not  occur. 

Consider  a  p+n  junction  with  a  single  deep  level  electron  or  hole  trap  in  the 
depletion  region  of  n-type  material.  If,  under  isothermal  conditions,  we  forward  bias 
the  semiconductor  to  All  the  trap  with  electrons  or  holes,  and  then  reverse  bias  the 
semiconductor  to  insure  the  trap  is  well  inside  the  depletion  region,  the  width  of 
the  depletion  region  will  change  as  the  capture  concentration  of  the  trap  changes 
with  time  due  to  the  change  in  depletion  charge.  By  incorporating  the  capture 
concentration  model.  Equation  232,  into  the  reverse  biased  depletion  region  width 
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equation,  we  can  model  this  change  with  (14:27) 


(233) 


to(T,t) 
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The  process  just  described  is  the  procedure  employed  in  DLTS.  The  width 
of  the  depletion  region  can  be  monitored  by  measuring  the  capacitance  across  the 
region.  The  measured  capacitance  is  governed  by  the  classic  relationship  (46:95) 

(234) 

where  A  is  the  area  of  the  junction.  Therefore,  the  capacitance  per  unit  area,  C,  for 
the  depletion  region  is 


w 


(235) 
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This  implies 


(236) 


C^(T  t)  = 
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In  both  terms  on  the  right  side  of  Equation  236,  we  know 


(237) 
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This  is  the  square  of  the  capacitance  per  unit  area  of  the  reversed  biased  depletion 
region  after  the  trap  has  emitted  all  of  its  captured  electrons  or  holes.  Let 


(238) 


qeNp 

2(Vbj  +  Vr) 


where  Css  is  the  steady  state  capacitance  per  unit  area.  Then,  we  can  say 


(239)  c\J,t)  =  Cl  +  Cl^e-’‘. 

In  a  DLTS  experiment,  this  is  the  square  of  the  measured  capacitance  of  the  depletion 
region,  at  a  constant  temperature,  for  the  time  period  immediately  after  the  forward 
bias  is  removed  and  the  reverse  bias  is  applied.  If  the  trap  is  primarily  an  electron 
trap,  the  charge  distribution  is  such  that  ^  is  negative  and  the  capacitance  transient 
will  increase  to  Css-  If  the  trap  is  primarily  a  hole  trap,  the  charge  distribution  is 
such  that  ^  is  positive  and  the  capacitance  transient  will  decrease  to  Css- 

Multiple  hole  or  electron  traps  are  linearly  modeled  by  (14:34) 

(240)  C\T,  t)  =  Cl  +  f;  Cl  (^)  e—. 

„=1  \^D/n 

If  discretized,  this  expression  can  fit  the  form  of  the  superimposed  exponential  signal 
model  used  throughout  this  dissertation.  Let 

(241)  5[m]  =  Co  +  X)  CnA™ 

n—1 
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where 


(242)  A„  =  e-’-^ 

and  8  is  the  sample  interval.  Also,  for  now,  assume  the  baseline  constant  cq  is 
the  amplitude  coefficient  of  an  added  exponential  with  Aq  =  1.  The  estimation 
algorithms  developed  thus  far  can  provide  estimates  of  cq,  c„,  and  A„.  In  the  latter 
sections  of  this  chapter,  we  will  address  more  appropriate  approaches  to  estimating 
the  intercept  cq.  Nevertheless,  with  estimates  of  c„  and  A„,  we  can  solve  for  the 
trap  concentration  and  the  emission  rate,  r,  of  each  defect.  With  trap  emission 
rate  estimates  obtained  over  a  range  of  temperatures,  we  can  further  characterize 
the  trap.  To  do  so,  we  first  need  to  develop  more  background  on  electron  and  hole 
trap  emission  rates. 

In  the  DLTS  experiment,  we  can  limit  our  discussion  to  thermal  emission  rates 
which  can  be  modeled  as  a  function  of  temperature  (14:13-17).  Because  the  thermal 
emission  rate  has  dependence  on  the  thermal  capture  rate,  we  can  show 

(243) 

where  the  n  subscript  indicates  electron  trap,  the  p  subscript  indicates  hole  trap,  Qn,p 
are  assumed  constants,  are  thermal  capture  cross  sections,  k  is  the  Boltzmann 
constant,  and  AEn  is  the  separation  energy  between  the  lower  edge  of  the  conduction 
band  and  the  energy  level  of  an  electron  trap  {AEn  =  Ec  -  Et),  and  AEp  is  the 
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separation  energy  between  the  energy  level  of  a  hole  trap  and  the  upper  edge  of  the 
valence  band  {AEp  =  Et  —  E^). 

Lang’s  analog  DLTS  analysis  technique  (45),  which  is  also  explained  shortly, 
estimates  the  inverse  of  the  trap  emission  rate,  r  =  K  Therefore,  estimating  r, 
rather  than  r,  has  become  the  convention.  From  the  thermal  emission  rate  equation, 
consider 


rn,p  =  (Q7i,pTV„,p)  ^  kT 

(244)  =>  =  iQn,p(rn,p)~^ 

=4>-  In  (T  T,i,p)  =  In  {Qn,p(^n,p)  d"  ■ 

The  last  expression  has  linear  form.  If  isothermal  capacitance  transient  signals  are 
created  and  measured  at  various  temperatures,  and  the  emission  rates  are  estimated 
for  each  signal,  In  (TV„^p)  can  be  plotted  versus  The  slope  of  the  plot  corresponds 
to  AE,  which  can  be  solved  for  the  energy  level  of  the  trap,  and  the  intercept  of  the 
plot  corresponds  to  In  {Qn,pO'n,p)j  which  can  be  solved  for  the  thermal  capture  cross 
section  of  the  trap.  Plots  of  this  nature  are  called  Arrhenius  plots. 

When  Lang  introduced  DLTS,  high  speed  digitizers  were  not  available  for  dig¬ 
ital  analysis.  Despite  this,  he  proposed  an  easy  to  implement  analog  approach  for 
estimating  the  r  of  various  isothermal  transients  and  subsequent  Arrhenius  plotting. 
At  this  time,  Lang’s  approach  is  developed  to  identify  its  limitations  and  to  give 
physical  insight  into  the  DLTS  problem.  At  the  end  of  this  chapter,  Lang’s  ap¬ 
proach  is  used  to  highlight  the  difficulty  of  obtaining  adequate  estimates  on  actual 
DLTS  data. 

First,  Lang  assumes  there  is  only  one  trap  in  the  depletion  region  and  that  its 
concentration,  Nt,  is  much  less  than  the  concentration  of  the  shallow  donors,  Nj). 
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Figure  57.  Capacitance  transients  for  a  single  hole  trap  in  the  depletion  region  of 
a  p+n  junction  at  various  temperatures.  Also  shown  is  the  AC(T)  for 
ti  and  t2  in  each  transient.  The  figure  was  obtained  from  Lang  (46). 

Then,  a  binomial  expansion  of  Equation  239  leaves 
(245)  C{T,t)  = 

The  left  side  of  Figure  57  is  a  plot  of  the  capacitance  transients  for  a  hole  trap 
at  various  temperatures.  As  expected  with  the  thermal  emission  rate  equation,  t 
decreases  as  temperature  increases.  Notice,  from  the  right  side  of  Figure  57,  the 
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change  in  capacitance  between  times  and  t2, 


(246) 


AC(T)  =  C(T,«,)-C(T,i2), 


is  minimal  at  low  temperatures,  minimal  at  high  temperatures,  and  maximal  at  one 
temperature  in  the  middle.  With  the  capacitance  transient  model  for  a  single  trap, 
Equation  245,  we  can  obtain  an  analytical  expression  for  AC'(T)  where 


AC'(T)  = 
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From  the  previous  development,  we  know  AC'(T)  is  a  function  of  temperature 
through  the  inverse  of  the  thermal  emission  rate,  r.  By  differentiating  AC'(T)  with 
respect  to  r,  and  equating  to  zero,  we  see 


_  h  —  h 
ln(t2Ai)’ 

Therefore,  referring  to  Figure  57,  the  temperature  associated  with  AC{T)jnax  corre¬ 
sponds  to  a  singe  trap  capacitance  transient  with  r^ax  =  • 

By  changing  ti  and/or  t2,  the  peak  for  AC{^)max  changes  and  a  new  temper¬ 
ature  associated  with  AC'(T)TOaa:  corresponds  to  a  new  r^nax  =  in*(f^/f\)  •  process 
can  be  repeated  as  many  times  as  necessary  to  plot  In  (TV)  versus  in  an  Arrhe¬ 
nius  plot.  This  approach  is  known  as  the  rate  window  approach.  By  anticipating  the 
and  t2  samples  necessary  to  produce  an  adequate  Arrhenius  plot,  double  boxcar 
integrators  can  be  placed  in  line  to  calculate  AC(T)  for  every  transient.  The  rate 
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window  approach  is  quick  and  simple,  and  yields  single  trap  parameter  estimates 
without  a  digitizer. 

However,  the  rate  window  approach  does  not  have  multi-trap  capability,  and 
as  will  be  shown  at  the  end  of  this  chapter,  degrades  with  noise.  The  advent  of 
high  speed  digitizers  and  improvements  in  digital  signal  processing,  namely  linear 
prediction,  have  provided  some  relief  to  these  problems.  In  the  next  section,  we 
review  an  early  LP  approach  to  DLTS  analysis  and  present  our  improved  iterative 
generalized  least  squares  (IGLS)  approach. 

6.3  Linear  Prediction  for  DLTS  Analysis 

As  mentioned  in  the  beginning  of  this  chapter,  Shapiro  et  al.  (65)  introduced 
an  LP  based  approach  for  DLTS  analysis  in  1984.  Their  proposed  algorithm  is 
essentially  the  overmodeled  least  square  algorithm  (OLS)  developed  in  Chapter  III. 
In  DLTS,  Shapiro  et  al.  recognized  that  because  the  concentrations  of  any  deep 
level  traps,  Nt,  are  usually  much  smaller  than  the  concentration  of  the  donors,  Nd, 
the  baseline  constant,  cq,  is  usually  much  larger  than  the  amplitude  coefficients,  c„. 
Treating  cq  as  the  amplitude  coefficient  of  an  additional  exponential,  Aq,  without 
properly  constraining  Aq  to  equal  one,  seriously  degraded  the  performance  of  their 
algorithm.  They  improved  their  estimates  by  differencing  the  discrete  transient 
at  successive  points  prior  to  implementing  OLS  (65:3457-3458).  In  a  sense,  they 
obtained  estimates  from  the  derivative  of  the  data,  which  eliminated  the  constant. 
At  the  end  of  this  section,  we  take  a  similar  tack,  but  first  we  must  recognize  another 
approach. 

In  1992,  Doolittle  and  Rohatgi  (10)  introduced  hardware  to  estimate  and  re¬ 
move  (null)  the  baseline  constant  from  the  capacitance  transients.  They  then  applied 
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the  OLS  algorithm  of  Shapiro  et  al.  and  obtained  good  results.  For  comparison,  we 
simulated  capacitance  transient  data  without  a  baseline  constant  and  tested  the 
performance  of  the  IGLS  algorithm  against  the  OLS  algorithm. 

It  should  be  noted,  that — from  the  work  of  Kumaresan  and  Tufts  (72) — Shapiro 
et  al.  (65)  and  Doolittle  and  Rohatgi  (11)  both  assumed  any  extraneous  A„  estimates 
would  be  complex  and  used  this  as  a  criterion  for  actual  A„  acceptance.  We  find  this 
criterion  to  be  insufficient  and  have  added  the  energy  sort  described  in  Chapter  III 
to  all  OLS  estimates. 

For  our  simulation,  one  hundred  data  point  realizations  of  y  were  created  with 
randomly  generated  Gaussian  noise  and  the  underlying  signal  parameters:  c\  —  —  1, 
C2  =  —.5,  Al  =  .8,  Aa  =  .9.  The  noise  contributions  were  governed  by  SNR,  in  dB, 
under  the  standard  formula.  For  each  realization,  10  iterations  of  the  IGLS  algorithm 
and  three  separate  runs  of  the  OLS  algorithm,  with  prediction  orders  P  =  20,  30 
and  40,  were  performed. 

Figures  58,  59,  60,  and  61  are  inverse  MSE  and  bias  plots  for  comparing  the 
performance  of  the  IGLS  algorithm  with  the  OLS  algorithm.  The  MSE  and  bias 
were  calculated  after  200  Monte-Carlo  simulations  for  each  SNR.  From  the  figures, 
we  note  IGLS  consistently  provides  lower  variance  estimates  than  OLS,  regardless 
of  prediction  order,  over  a  usable  range  of  SNRs.  Also,  above  an  SNR  threshold — 
approximately  45  dB — IGLS  essentially  attains  the  CRB.  The  performance  of  OLS 
decreases  as  SNR  increases  because  the  overmodeling  becomes  inappropriate. 

Given  the  experiments  we  performed  in  Chapter  III  on  the  sum  of  two  complex 
exponentials,  these  results  for  the  sum  of  two  real  exponentials  are  expected.  The 
IGLS  algorithm  should  replace  the  OLS  algorithm  for  linear  prediction  based  DLTS 
when  the  baseline  constant,  cq,  is  nulled  with  hardware.  However,  we  claim,  the 
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Figure  59.  The  Ai  bias  of  the  IGLS  and  OLS  estimators  over  a  range  of  SNRs. 
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Figure  60.  The  A2  inverse  MSE  of  the  IGLS  and  OLS  estimators  over  a  range  of 
SNRs.  Also  plotted  is  the  CRB. 
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Figure  61.  The  A2  bias  of  the  IGLS  and  OLS  estimators  over  a  range  of  SNRs. 
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need  for  the  nulling  hardware  is  unnecessary.  Unlike  the  OLS  algorithm,  the  IGLS 
algorithm  can  be  accurately  extended  to  incorporate  baseline  constant  estimation. 
The  extension  is  not  obvious  when  the  algorithm  is  derived  form  the  maximum 
likelihood  (ML)  methodology,  but  with  the  introduction  of  our  linear  prediction 
methodology,  the  ML  cost  function  for  superimposed  exponentials  with  a  baseline 
constant  follows  nicely. 

In  our  development,  we  keep  the  differencing  approach  suggested  by  Shapiro 
et  al.  (65).  Incidentally,  differencing  was  also  recommended  by  Hildebrand  (22:458- 
462),  in  1956,  for  the  general  superimposed  exponential  problem  with  a  baseline 
constant.  Unfortunately,  both  Hildebrand  and  Shapiro  et  al.  did  not  account  for 
the  effects  of  noise  in  their  linear  prediction  methodology.  In  this  dissertation,  we 
have  established  the  inferior  performance  of  LP  based  estimators  without  accurate 
noise  analysis.  The  performance  of  differenced  LP  based  estimators  without  accurate 
noise  analysis  is  even  more  degraded.  As  alluded  to  before,  with  our  methodology, 
properly  accounting  for  the  effects  of  noise  in  differenced  linear  prediction  is  straight 
forward. 

Let 


yd[m]  =  y[m\  -  y[m  +  1] 

(249) 

=  Sd[rn]  +  w[m]  —  w[rn  +  1],  m  =  0, 1 

,...,M-2 

where 

N 

(250) 

Sd[m]  =  ^(1- A„)c„A;^. 

n=l 
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The  noiseless  signal,  is  also  a  solution  to  a  difference  equation  with  constant 

coefficients.  Therefore,  if  we  let 

(251)  Sd[m]  =  yd[m]  -  w[m]  +  w[m  +  1] 

we  know 

(252)  boydim]  +  biyd[m  -  1]  +  . . .  +  bNyd[m  -  N]  =  ed[m] 

where 


ed[m]  =  bQw[m\  +  biw[m  —  1]  +  . . .  +  bj^iw\m  —  — 

(253)  —bQw[m  +  1]  —  biw\m]  —  ...  —  biqw[m  —  N  +  1]. 

The  LP  error  ed[m]  has  more  elements  than  the  previous  e\m]  of  Chapter  III,  but  it 
is  still  composed  of  a  tractable  sum  of  weighted  Gaussian  random  variables.  Under 
an  overdetermined  set  of  linear  equations,  let 


(254) 


Ydb  =  ed 


where 


Yd  = 


Vdn 


(255) 


b  = 


e  = 


VdO  Vdl  •••  VdN 

yd[N  -  n]  yd[N  +  1  -  n]  •  •  •  yd[M  -  2  -  n] 

iT 

bo  bi  •  ■  ■  bjf 


ed[N]  ed[iV+l]  ...  ed[M-2] 
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The  data  matrix,  Y^,  has  dimensions  (M  —  iV  —  1)  X  (A^  +  1)  and  Toeplitz  structure. 
The  LP  error  vector  ej,  is  distributed  with  mean  vector  zero  and  covariance  matrix 
(j'^Rd-  If  we  define  the  (M  —  iV  —  1)  x  (M  —  1)  Toeplitz  matrices 

bo  bi  •  •  •  bif  0  •  •  •  0 

0  bo  bi  •  •  •  fejv  • .  : 

:  •••  •••  •••  0 

0  •  •  •  0  bo  bi  •  •  • 

0  6o  ^1  ■  ■  ■  ^jv+i  bj[^  0  •  •  •  0 

0  0  bo  bi  •  •  •  bjv  ’  •  • 

(257)  B2=  :  ;  0  , 

:  :  ■  • .  ‘ •  •  '  • .  ‘  •  •  '  • .  b^ 

0  0  .  0  bo  bi  •  •  •  tjv'+i 

then  the  differenced  LP  error  covariance  matrix  is  obtained  with 

(258)  Ed  =  2BiB[  -  BiB^  -  BzB^ 

and  the  generalizes  least  squares  (GLS)  estimator  becomes 

1 

(259)  b  = 

-(Yl 

As  before,  an  iterative  algorithm  naturally  follows.  After  rooting,  the  am¬ 
plitude  coefiicients,  cq  and  c„,  can  be  estimated  from  an  expanded  pseudo- inverse 


(256)  Bx  = 


and 
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estimator 


(260) 


Co 

c 


{VlV^X^Vly 


where  the  matrix  Vc^  consists  of  an  M  dimensional  column  of  ones  appended  in  front 
of  the  standard  Vandermonde  matrix,  V. 

In  simulations,  realizations  of  y  were  created  with  the  same  parameters  as 
earlier  in  this  section,  only  a  baseline  constant,  cq,  was  added  at  an  elevated  value 
of  50.  The  differenced  iterative  generalized  least  squares  (DIGLS)  estimator  was 
tested  against  a  differenced  overmodeled  least  squares  (DOLS)  estimator  with  an 
energy  sort.  The  experiment  was  conducted  identical  to  that  of  the  IGLS  and  OLS 
estimators  in  this  chapter. 

Figures  62,  63,  64,  and  65  are  inverse  MSE  and  bias  plots  for  comparing  the 
performance  of  the  DIGLS  algorithm  with  the  DOLS  algorithm.  Similar  to 
the  experiment  without  cq,  DIGLS  consistently  performs  better  than  DOLS  over 
a  usable  range  of  SNRs  and  attains  the  CRB.  Figure  66  illustrates  the  difference 
between  DIGLS  and  IGLS  for  the  Ai  estimates.  We  see  that  adding  the  cq  constant 
does  not  significantly  effect  the  accuracy  of  the  iterative  estimators.  Note,  the  error 
associated  with  nulling  is  not  reflected  in  the  IGLS  inverse  MSE  plot.  Obtaining  all 
estimates  with  software  may  indeed  be  more  accurate  than  introducing  hardware  for 
assisting  in  the  task.  In  the  next  section,  the  capabilities  of  the  DIGLS  algorithm 
are  verified  on  actual  DLTS  signals.  Like  many  DLTS  facilities,  we  do  not  utilize 
hardware  for  cq  nulling.  Therefore,  we  use  the  differenced  approach. 
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Figure  62. 


The  Ai  inverse  MSE  of  the  DIGLS  and  DOLS  estimators  over  a  range 
of  SNRs.  Also  plotted  is  the  CRB. 
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Figure  63.  The  Ai  bias  of  the  DIGLS  and  DOLS  estimators  over  a  range  of  SNRs 
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Figure  64.  The  A2  inverse  MSE  of  the  DIGLS  and  DOLS  estimators  over  a  range 
of  SNRs.  Also  plotted  is  the  CRB. 
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Figure  65.  The  A2  bias  of  the  DIGLS  and  DOLS  estimators  over  a  range  of  SNRs. 
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Figure  66.  Inverse  MSE  versus  SNR  plot  for  Ai.  DIGLS  and  IGLS  compared  on 
simulated  data  with  and  without  a  baseline  constant,  respectively. 


d.Jf.  DLTS  Experiments 

The  DIGLS  and  DDLS  estimators  were  compared  on  the  capacitance  transients 
of  two  different  semiconductors.  First,  the  arsenic  anti-site  defect,  Asoa)  in  GaAs — 
namely  the  EL2  level — was  considered.  The  EL2  deep  level  was  selected  for  initial 
verification  of  the  DIGLS  estimator  because  of  its  well  known  trapping  kinetics  and 
deep  level  parameters  (43,  73).  The  EL2  capacitance  transient  signals  were  recorded 
in  both  maximal  and  artificially  reduced  SNR  scenarios.  They  were  digitized  at 
isothermal  increments  of  4  AT  using  conventional  const  ant- volt  age  biasing  (CVDLTS) 
of  Ti/Pt/Au  Schottky  diodes  formed  on  low-temperature  grown,  Sb-doped,  GaAs 
substrate  material.  A  large  CVDLTS  reverse  bias  voltage  of  —5.0  volts  was  utilized 
to  maximize  the  width  of  the  depletion  region  and  minimize  the  non-exponential 
effects  due  to  a  free  carrier  tail  at  the  edge  of  the  depletion  region.  It  has  been 
shown,  that  the  large  reverse  bias  voltage  does  not  effect  the  expected  separation 
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energy,  AE„,  in  EL2  (5).  Forward  bias  filling  pulse  heights  and  pulse  durations, 
used  to  completely  saturate  the  traps,  were  1.5  volts  and  10  ms  respectively.  The 
resulting  capacitance  transients  were  sampled  at  100  /is  intervals  with  500  sample 
points  recorded  to  yield  a  50  ms  digitized  transient  period.  Noise  averaging  at 
each  isothermal  step  was  nominally  high  (100  transients)  to  facilitate  the  zero  mean 
Gaussian  assumption.  Artificial  SNR  reduction  was  accomplished  using  a  double 
correlated  mode  of  DLTS  operation  (DDLTS)  to  control  and  reduce  the  volume  of 
space  charge  from  which  emission  occurs.  Specifically,  the  difference  signal,  resulting 
from  the  capacitive  decay  for  differing  forward  bias  filling  pulse  heights  under  the 
same  measurement  bias,  was  analyzed.  Variations  in  the  forward  bias  difference, 
\VF1  —  VF2\,  resulted  in  controllable  variations  of  the  SNR. 

The  Arrhenius  plot  shown  in  Figure  67  is  for  comparing  the  performance  of 
the  DIGLS  and  DOLS  algorithms  on  the  EL2  deep  defect.  In  all  cases,  the  DIGLS 
estimator  was  implemented  with  10  iterations,  and  the  DOLS  estimator  was  im¬ 
plemented  with  a  prediction  order  P  =  30  and  an  energy  sort.  The  data  shown 
in  the  figure  correspond  to  the  solutions  for  a  single  mode  {N  =  1)  analysis  using 
the  DIGLS  and  DOLS  estimators  under  the  maximal  SNR  scenario.  Immediately 
apparent  from  the  Arrhenius  plots  is  the  very  significant  extension  of  the  range  of 
useful  emission  rate  data  in  the  case  of  the  DIGLS  estimator.  In  other  words,  the 
DIGLS  estimator  is  seen  from  the  figure  to  be  much  more  robust  at  the  lower  tem¬ 
peratures  where  the  capacitance  transient  is  significantly  truncated,  or  equivalently, 
is  characterized  by  slower  emission  rates.  A  more  important  point  to  make  regard¬ 
ing  the  data  shown  in  the  figure  is  that  the  separation  energy,  obtained  from  the 
resulting  linear  slope  of  the  DIGLS  emission  rate  estimates,  equals  791  meV.  This  is 
in  much  better  agreement  with  accepted  transient  spectroscopy  values  than  that  of 
the  DOLS  estimates.  The  best  fit  to  the  linear  region  data  obtained  from  the  DOLS 
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algorithm  yields  a  separation  energy  of  748  meV .  This  underestimates  even  the  low 
end  separation  energies  obtained  using  alternative  Hall  effect  measurements  (48). 

Figure  68  illustrates  more  concisely  the  superiority  of  the  improved  DIGLS 
linear  prediction  algorithm.  The  data  represents  the  DIGLS  and  DOLS  estimates 
on  EL2  transients  with  artificially  reduced  SNR.  The  DIGLS  algorithm  slightly  un¬ 
derestimates  the  EL2  energy  level  (-10%  )  but  the  DOLS  estimator  is  completely 
ineffective.  For  confidence  in  our  experiment,  we  have  shown  in  our  article  (31)  that 
the  OLS  algorithm,  implemented  by  Doolittle  and  Rohatgi  (12),  obtains  inferior 
estimates  to  our  DOLS  estimator  under  equivalent  EL2  conditions. 

Finally,  the  extent  of  signal  degradation  and  the  significance  of  the  improve¬ 
ments  afforded  by  the  DIGLS  estimator  can  be  fully  appreciated  by  observing  the 
rate  window  plots  of  the  maximal  and  degraded  SNR  scenarios.  They  are  illustrated 
in  Figures  69  and  70  respectively.  By  convention,  six  rate  windows  were  applied 
from  each  end  of  the  transients  with  ^2/^1  ratios  of  2,  5,  and  10,  respectively.  The 
degraded  SNR  rate  window  plot  is  unintelligible. 

After  verification  under  simulated  and  controlled  DLTS  experimental  condi¬ 
tions,  the  DIGLS  and  DOLS  estimators  were  compared  on  a  more  general  problem 
potentially  involving  multi-mode  transient  decay.  The  6H  polytypic  modification  of 
the  SiC  material  system  was  analyzed.  In  n-6H-SiC  bulk  substrate  material,  we  an¬ 
ticipate  a  DLTS  signal  with  two  resolvable  separation  energies.  To  date,  multi-mode 
DLTS  signals  have  not  been  resolved.  Only  heuristic  arguments  and  postulation  of 
their  existence,  based  on  experimental  data  yielding  closely  spaced  energy  levels, 
have  been  suggested  (16). 

Figure  71  is  a  rate  window  plot  for  n-6H-SiC  material.  Typically,  broad  over¬ 
lapping  peaks  and  shoulders  are  observed  in  the  rate  widow  plots  of  multi-mode 
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Figure  68.  Arrhenius  plot  for  EL2  with  artificially  degraded  SNR. 
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Figure  69.  Rate  window  plot  for  EL2  with  maximal  SNR. 
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Figure  71.  Rate  window  plot  for  n-6H-SiC. 
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materials.  This  is  not  the  case  in  Figure  71.  Similarly,  the  DOLS  estimator  only  de¬ 
tected  one  decay  mode.  However,  our  DIGLS  estimator  revealed  a  definite  indication 
of  the  presence  of  two  exponential  decay  modes.  Figure  72  illustrates  the  resulting 
Arrhenius  analysis  of  the  DIGLS  fitted  decay  time  constants  showing  convincing  ev¬ 
idence  for  a  two  mode  transient  decay.  The  separation  energies  of  509  and  543  meV 
obtained  from  the  linear  slope  are  seen  to  deviate  significantly  from  the  rate  window 
peak  analysis  estimate  of  760  meV.  This  is  not  surprising  if  we  recall  that  the  clas¬ 
sical  peak  analysis  method  is  invalid  if  more  than  a  single  decay  mode  exists  (45). 
The  results  presented,  may  be  the  first  explicit  DLTS  data  supporting  the  existence 
of  SiC  deep  level  energetic  pairs  for  an  apparent  spectral  feature  indicating  single 
mode  decay. 

The  apparent  success  obtained  in  resolving  closely  spaced  levels  of  the  electron 
traps  in  n-6H-SiC  led  us  to  apply  the  DIGLS  estimator  to  a  commonly  observed 
hole  trap  in  p-6H-SiC.  This  defect  has  been  observed  by  us  to  be  present  in  most 
substrate  wafer  material  and  is  readily  formed  upon  ion-implantation  into  epitaxial 
material  (63).  Figure  73  is  the  rate  window  plot  for  the  CVDLTS  transients.  In  the 
figure,  the  shoulder  (marked  by  the  large  arrow)  is  indicative  of  multi-mode  decay. 
However,  both  the  IGLS  and  DOLS  estimates  for  the  transients  indicate  the  pres¬ 
ence  of  only  a  single  exponential  component  with  a  separation  energy  of  861  meV . 
Figure  74  illustrates  the  fitted  data  for  this  defect.  The  striking  feature  of  the  DIGLS 
estimator  is  the  extension  of  useful  transient  data  by  almost  50  K,  or  10  transient 
emission  rate  data  points,  on  the  low  temperature  side.  Consequently,  characteriza¬ 
tion  of  the  defect  level  in  p-6H-SiC  can  be  made  with  increased  confidence. 
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Figure  72.  Arrhenius  plot  for  n-6H-SiC. 
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Figure  73.  Rate  window  plot  for  p-6H-SiC.  The  large  arrow  points  at  a  shoulder 
that  typically  represents  multi-mode  decay. 
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Figure  74.  Arrhenius  plot  for  p-6H-SiC. 


6.5  Additional  Estimation  Algorithms  for  DLTS 


In  the  course  of  our  research,  we  realized  the  exact  maximum  likelihood  cost 
function  for  the  one  mode  DLTS  signal  can  be  minimized  directly  without  differenc¬ 
ing  and  without  linear  prediction  coefficient  representation.  Let 

(261)  K.  =  [  I  vi 

where  1  is  a  M  x  1  vector  of  ones  and 

(262)  f5i=[l  Ai  A?  ...  Af-']"^, 

so  that 

(263)  y  =  Vc^  ^  +w. 

Cl 

From  Chapters  II  and  III  we  know  the  ML  solution  is  attained  by  maximizing 

(264)  L(VJ  =  fV„(V^V„)-'V^y. 

We  find  the  behavior  of  the  one  mode  DLTS  ML  cost  function  is  similar  to  the 
behavior  of  the  one  mode  real  exponential  ML  cost  function  analyzed  in  Chapter  V. 
Over  a  large  range  of  SNRs,  the  one  mode  DLTS  ML  cost  function  is  unimodal. 
Consequently,  with  only  one  variable,  Ai,  a  golden  section  search  can  be  implemented. 
Details  of  the  golden  section  search  are  given  in  Chapter  V. 

Figures  75  and  76  are  inverse  MSE  and  bias  plots  for  comparing  the  perfor¬ 
mance  of  the  DIGLS  algorithm  with  the  golden  section  search  (Gold  DLTS)  algo- 
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Figure  75.  The  inverse  MSE  of  the  DIGLS  and  Gold  DLTS  estinaators  over  a  range 
of  SNRs.  Also  plotted  is  the  CRB. 

rithm  for  the  one  mode  DLTS  problem.  The  Monte-Carlo  experiment  used  to  create 
the  plots  is  identical  to  the  experiments  accomplished  earlier  in  the  chapter  except 
the  underlying  signal  consists  of  only  one  real  mode  with  parameters  ci  =  —  1  and 
Ai  =  .8.  As  expected,  the  Gold  DLTS  algorithm  is  superior  at  lower  SNRs.  For 
the  one  mode  problem,  the  golden  section  search  is  more  effective  at  maximizing 
the  exact  ML  cost  function.  However,  the  Gold  DLTS  algorithm  is  restricted  to  one 
mode  DLTS  problems. 

In  addition  to  the  one  mode  DLTS  algorithm  just  delivered,  we  also  developed  a 
multi-mode  DLTS  algorithm  that  performs  differencing  implicitly.  In  this  algorithm, 
we  treat  the  baseline  constant  as  an  additional  exponential,  Aq,  with  an  amplitude 
coefficient,  cq,  but  we  constrain  Aq  to  equal  one  in  an  elegant  and  insightful  way. 
The  algorithm  essentially  follows  the  development  of  IGLS. 
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Figure  76.  The  bias  of  the  DIGLS  and  Gold  DLTS  estimators  over  a  range  of  SNRs. 
With  a  model  order  of  +  1,  consider  the  noiseless  LP  equation 

(265)  6os[^]  +  bis[m  —  1]  +  . . .  +  bffs[m  —  N]-\-  fej\r+is[m  —  (N  +  1)]  =  0. 

Recall,  the  estimates  of  A„  are  the  roots  of  the  polynomial  formed  from  the  LP 
coefficients.  Therefore,  we  assume. 

(266)  bQZ^'^^  +  biz^  +  b2Z^  ^  +  . . .  +  +  bj^^i  —  0. 


Since  cq  is  a  constant,  we  know  Aq  =  1  is  a  root  of  the  LP  polynomial.  Equation  266. 
Therefore, 


(267) 


^0  +  +  ^2  +  •  •  •  +  5jv  +  5jv+i  —  0 

^  bN+i  =  ~{bo  +  5i  +  ^2  +  •  •  •  +  ^iv)- 
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This  is  a  constraint  we  can  apply  in  the  standard  IGLS  development.  Since 

(268)  s[m]  =  y[m]  —  w[m], 
we  know 

+  hy[m  -  1]  +  . . .  +  huy[m  -  N]  +  bN+iy[m  -  (N  +  1)] 

(269)  =  bow[m]+biw[m-l]  + ...  +  btfw[m- N]  +  b^^iw[m- {N +  1)]. 

When  we  introduce  the  constraint  of  Equation  267,  we  can  say 

boiy[m]  -  y[m  -  {N  +  1)])  +  bi{y[m  -  1]  -  y[m  -  {N  +  1)])  +  . . .  + 
bNy{[m  -  N]-  y[m  -  (N  +  1)]) 

=  bow[m]  +  biw[m  —  1]  +  . . .  +  b^wlm  —  N]  — 

{bo  +  bi  +  b2  +  ...  +  6jv')w[m  -  (A^  +  1)] 

(270)  =  ec[m]. 

The  LP  error  ec[m]  is  still  composed  of  a  tractable  sum  of  weighted  Gaussian  random 
variables.  Under  an  overdetermined  set  of  linear  equations,  let 

(271)  Y,b  =  e. 
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where 


Vcn 


VcO  Vcl  •  ••  VcN 

(y[Ar  +  1  -  n]  -  ?/[0])  {y[N  +  2  -  n]  -  y[l]) 


^T 


(272) 


b  = 


e  = 


y{[M -l-n]-y[M  -  N -2]) 

T 

I 

bo  bi  •••  bpf 


ec[N  +  1]  ec[N  +  2]  •  •  •  ec[M  —  1] 


^T 


The  data  matrix,  Yc,  has  dimensions  (M  —  iV  —  1)  x  {N  +  1)  and  Toeplitz  structure. 
The  LP  error  vector  is  distributed  with  mean  vector  zero  and  covariance  matrix 
a^Rc-  If  we  define  the  (M  —  N  —  1)  x  M  Toeplitz  matrix 


(273) 


5.= 


bo  bi  •  •  • 

0  bo  bi 

0  •  •  •  0  bo 


— (6o+6i+."+i>)v)  0  •  •  • 

bif  — (6o+6i+."+fcjv)  ’ ' 


0 


bi 


bjf  — (fro+&i+...+fciv) 


then  the  differenced  LP  error  covariance  matrix  is  obtained  with 


(274) 


and  the  GLS  estimator  becomes 


Rd  =  BcB, 


(275) 


b  =  -(Y^ 
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SNR  dB 

I 

Figure  77.  The  Ai  inverse  MSE  of  the  DIGLS  and  IGLS  cO  estimators  over  a  range 
of  SNRs.  Also  plotted  is  the  CRB. 

As  before,  an  iterative  algorithm  naturally  follows,  and  after  rooting,  the  am¬ 
plitude  coefficients,  cq  and  c„,  can  be  estimated  from  the  expanded  pseudo-inverse 
estimator 

(276)  =  (VjV..)-'Vly. 

c 

Figures  77,  78,  79,  and  80  are  inverse  MSE  and  bias  plots  for  comparing  the 
performance  of  the  IGLS  algorithm  with  the  cq  constraint  (IGLS  cq)  and  the  DIGLS 
algorithm  derived  earlier  for  the  two  mode  DLTS  problem.  The  performance 
of  the  two  algorithms  is  essentially  identical  until  below  the  SNR  threshold.  This 
is  expected  because  both  algorithms — although  differing  in  appearance — attempt 
to  minimize  the  ML  cost  function.  This  bolsters  our  claim  that  introducing  our 
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Figure  79.  The  A2  inverse  MSE  of  the  DIGLS  and  IGLS  cq  estimators  over  a  range 
of  SNRs.  Also  plotted  is  the  CRB. 
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SNR  dB 

Figure  80.  The  A2  phase  bias  of  the  DIGLS  and  IGLS  cq  estimators  over  a  range 
of  SNRs. 

LP  methodology  brings  valuable  insight  to  the  superimposed  exponential  parameter 
estimation  problem. 

6. 6  Chapter  Conclusion 

We  began  the  chapter  by  developing  the  fundamentals  of  DLTS,  and  its  rela¬ 
tionship  with  the  superimposed  exponential  model.  We  discussed  immediate  esti¬ 
mation  applications  of  the  work  developed  in  the  previous  chapters  for  DLTS  signals 
with  the  baseline  constant  estimated  and  removed  by  hardware.  Then,  we  expanded 
our  IGLS  algorithm  to  estimate  the  baseline  constant  without  the  need  of  DLTS 
hardware.  Our  differenced  IGLS  algorithm  was  facilitated  by  our  understanding  of 
linear  prediction  and  would  be  difficult  to  develop  from  maximum  likelihood.  The 
performance  of  the  DIGLS  algorithm  was  verified  with  simulations  and  actual  DLTS 
data.  Finally,  we  concluded  the  chapter  with  additional  algorithms  developed  for 
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DLTS  and  generated  by  our  increased  understanding  of  the  superimposed  exponen¬ 
tial  problem. 
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VII.  Conclusion 


7.1  Primary  Contributions 

We  began  this  dissertation  by  developing  the  exact  maximum  likelihood  (ML) 
cost  function  for  estimating  the  parameters  of  superimposed  exponentials  in  zero 
mean,  white,  Gaussian  noise.  In  the  same  chapter.  Chapter  II,  we  showed  how  the 
ML  methodology  is  related  to  the  Cramer-Rao  bound  (CRB)  and  why  the  CRB 
is  useful  for  evaluating  the  performance  of  estimators.  In  the  next  chapter.  Chap¬ 
ter  III,  we  gave  a  historical  review  of  parameter  estimators  developed  with  the  linear 
prediction  (LP)  methodology.  At  the  end  of  the  chapter,  we  introduced  critical — 
previously  omitted — statistical  modeling  considerations  for  linear  prediction.  With 
our  statistical  considerations,  we  showed  how  the  exact  ML  cost  function,  for  the 
superimposed  exponential  parameter  estimation  problem,  can  be  developed  with  the 
linear  prediction  methodology. 

Attaining  maximum  likelihood  performance  with  linear  prediction  is  the  most 
important  contribution  of  our  work.  Linear  prediction  can  often  provide  critical  in¬ 
sight  into  the  analysis  of  a  problem.  Previous  to  our  work,  the  LP  approach  was 
hindered  by  the  erroneous  assumption  that  estimators  developed  from  linear  predic¬ 
tion  are  always  inferior  to  estimators  developed  from  maximum  likelihood.  With  our 
linear  prediction  methodology,  researchers  can  utilize  the  insights  and  flexibility  of 
linear  prediction  and  achieve  maximum  likelihood  estimation  performance. 

We  demonstrated  this  capability  in  our  deep  level  transient  spectroscopy  (DLTS) 
application  chapter.  Chapter  VI.  In  DLTS,  the  signal  is  complicated  by  the  presence 
of  a  baseline  constant.  With  our  linear  prediction  methodology,  the  exact  ML  cost 
function  for  the  DLTS  signal  was  intuitively  developed  for  minimization.  The  result- 
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ing  algorithm  provided  DLTS  parameter  estimates  significantly  more  accurate  than 
those  of  the  estimators  currently  in  use.  Our  algorithm  was  readily  accepted  by  the 
DLTS  community  because  of  their  previous  exposure  to  linear  prediction  and  their 
need  for  improved  capability.  In  fact,  our  DLTS  contribution  has  been  accepted  for 
publication  in  the  Journal  of  Applied  Physics  (30). 

1.2  Ancillary  Contributions 

In  our  research,  we  developed  additional  contributions.  In  the  linear  pre¬ 
diction  chapter  and  the  cost  function  minimization  chapter — Chapters  III  and  IV, 
respectively — we  proposed  and  defended  an  alternative  explanation  for  the  moderate 
success  of  total  least  squares  (TLS)  as  a  linear  prediction  based  approach  for  super¬ 
imposed  exponential  parameter  estimation.  We  suspect  that  the  readily  available 
minimum  norm  solution,  associated  with  overmodeling,  in  TLS  is  more  influential 
than  the  “increased  perturbation”  explanation  often  provided.  In  Chapter  IV,  we 
showed  how  TLS,  without  overmodeling,  is  actually  ill-conditioned  at  low  SNR. 

In  that  same  chapter,  we  showed  how  TLS,  without  overmodeling,  is  equiv¬ 
alent  to  the  eigenvalue-eigenvector  optimization  technique  used  extensively  for  ML 
cost  function  minimization.  We  compared  both  TLS,  without  overmodeling,  and 
eigenvalue-eigenvector  minimization  to  the  generalized  least  squares  (GLS)  approach 
and  concluded  GLS  is  more  appropriate  for  minimization  unless  additional  con¬ 
straints  must  be  accommodated. 

In  Chapter  V  we  carefully  analyzed  the  one  and  two  mode  real  exponential 
problem.  We  developed  unique  algorithms  as  alternatives  to  iterative  generalized 
least  squares  (IGLS)  and  exercised  IGLS  at  alternative  initial  conditions.  We  found 
the  IGLS  algorithm  to  be  consistently  effective  and  relatively  invariant  to  starting 
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position.  We  endorsed  the  IGLS  algorithm  as  the  most  versatile  tool  for  super¬ 
imposed  real  exponential  parameter  estimation.  However,  when  the  problem  was 
restricted  to  the  estimation  of  a  single  real  exponential,  we  found  our  application  of 
the  golden  section  search  to  be  superior.  This  fact  was  facilitated  by  our  observation 
that  the  exact  ML  cost  function,  for  the  single  real  exponential  problem,  is  unimodal 
in  A — accept  under  very  low  SNR  scenarios. 

1.3  Immediate  Additional  Work 

Programming  the  differenced  iterative  generalized  least  squares  algorithm  is 
straight  forward.  Nevertheless,  the  algorithm  needs  to  be  packaged  into  a  user 
friendly  program  for  wide  spread  DLTS  applications. 

Also,  the  single  mode  analysis  for  estimating  the  A  of  a  real  exponential  in 
Chapter  V  should  extend  to  other  single  mode  single  parameter  exponential  prob¬ 
lems  such  as  the  frequency  of  a  pure  sinusoid.  The  potential  of  directly  minimizing 
the  exact  ML  cost  function  for  a  pure  sinusoid  with  such  techniques  as  a  golden 
section  search  needs  to  be  investigated.  Accurate,  efficient  single  sinusoid  frequency 
estimation  is  a  high  interest  subject  (34,  66). 

Finally,  the  two  mode  analysis  for  estimating  the  A„  of  two  real  exponen¬ 
tials  in  Chapter  V  should  extend  to  other  two  mode  single  parameter  exponential 
problems  such  as  two  superimposed  sinusoids  at  a  known  frequency  with  different 
unknown  phases.  This  problem  arises  in  communication  theory  and  might  by  effec¬ 
tively  resolved  by  constraining  ICLS  for  phase  only  estimation.  As  always,  further 
investigation  is  desired. 
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Appendix  A .  Properties  of  Special  Matrices 

A.l  Vandermonde  Matrix  V 

The  M  X  N  matrix  V,  defined 

(277)  V=  ^(Ai)  v{X2)  •••  fi(Ajv) 
where 

(278)  fi(A„)=[l  A„  XI  ...  X^-^f 

and  M  >  N,  has  Vandermonde  structure.  When  each  A„  is  distinct,  each  column 
of  V  is  linearly  independent.  One  way  to  prove  this  is  to  consider  an  M  x  M 
Vandermonde  matrix  Vm  with  M  distinct  A^.  The  determinant  of  Vm  is 

M 

(279)  |Kr|=  n 

mj  ,m2=l 
mi>ni2 

For  the  determinant  of  Vm  to  equal  zero,  A^i  must  equal  Xm2-  Since  we  know 
each  Xm  is  distinct  the  determinant  of  Vm  is  not  equal  to  zero.  Therefore  Vm  is 
nonsingular  and  each  column  is  linearly  independent  (25:29).  Because  any  subset  of 
linear  independent  vectors  is  also  linear  independent  the  matrix  V  formed  from  any 
N  columns  of  Vm  has  full  column  rank. 


A.S  Toeplitz  Data  Matrix  Y  and  Y 

The  (M  —  N)  X  {N  +  1)  data  matrix  Y,  defined 

(280)  Y^  yQ  ...  yjy 

where 

tT 

(281)  Pn  =  y[N  —  n]  y[N  +  1  —  n]  •  •  •  y[M  —  1  —  n] 

and  M  >  N,  has  Toeplitz  structure.  Because  Y  is  comprised  of  stochastic  elements, 
its  properties  have  statistical  connotations.  First  consider  the  underlying  determin¬ 
istic  elements,  s[m],  under  the  same  construct  of  Y  so  that 

(282)  S  =  So  Si  ••  •  Siv’ 
where 

-,T 

(283)  Sn=  s[A^  — n]  s[iV-fl  — n]  •••  5[M  —  1  —  n] 

It  was  noted  by  Tufts  and  Kumaresan  (72:977)  that  the  rank  of  S  is  N.  This  is 
justified  by  the  relationship 

(284)  5  =  Vc. 

Every  column  of  S'  is  a  linear  combination  of  the  first  M  —  N  elements  of  the  N 
linearly  independent  columns  of  V.  Therefore,  the  rank  of  S  is  the  rank  of  V  which 
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is  N.  From  the  linear  prediction  equation,  we  also  know, 


(285) 


Sb  =  0 


where  6  is  a  vector  of  the  linear  prediction  coefficients  6o  ■  •  •  Therefore,  since  bo 
can  be  normalized  to  1  without  altering  the  relationship. 


(286) 


sq  T  Sb  —  0 


where 


(287) 


S  = 


Si  S2  Sjv 


and 


(288) 


bi  b2 


3iV 


Because  6  is  a  solution  to  so  +  Sb  =  0,  the  rank  of  S  must  equal  the  rank  of  S  which 
is  N  (21:140).  With  dimension  (M  —  N)  x  N,  S  has  full  column  rank.  Therefore, 
each  column  of  S  is  linearly  independent. 

In  our  research. 


(289) 


yn  =  Sn  +  W 
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where  the  stochastic  w  has  mean  vector  zero.  When  we  consider  the  (M  —  N)  x  N 
data  matrix 


(290) 


Y  = 


yi  2/2  •  •  •  Vn 


we  know  the  underlying  signal  matrix  has  rank  N.  We  do  not  expect  the  addition  of 
stochastic  elements  to  the  underlying  signal  matrix  in  Y  to  cause  linear  dependence 
in  the  columns  of  Y.  Therefore,  Y  is  assumed  to  have  full  column  rank,  N.  In 
stochastic  theory,  every  column  of  Y  is  considered  to  be  linearly  independent  as 
well  (56:275).  Therefore,  Y  is  assumed  to  have  full  column  rank,  A^  + 1,  even  though 
its  underlying  signal  matrix  rank  is  N.  This  theory  extends  to  all  noisy  data  matrices 
with  any  number  of  columns.  We  can  assume  all  noisy  data  matrices  have  full  column 
rank. 
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Appendix  B.  Complex  Gradients 


B.l  Derivation 

Let  b  and  6  be  complex  iV  x  1  vectors  and  let  ^  be  a  Hermitian  N  x  N  matrix. 
Recall  the  gradient  definitions 


^  _  1 

89  ~  2  ^Wi 

and 


(291) 


(292) 


89*  ~  2  '^^89i) 


where  L  is  a  scalar  function  and  9  =  9r  +  j9i  {9j.  and  9i  are  real  vectors). 
Notice  that  under  the  gradient  definitions, 


(293) 


8L  _  8L 

^  ~  e{l+j{-et)) 

2  {dSr 

=  i 

2  \89r  89i) 


Therefore,  ~  is  the  complex  conjugate  of 

Next,  consider  the  gradient  ||  when  the  vector  complex  variable,  9,  is  reduced 
to  a  scalar  complex  variable,  9,  so  that 


(294) 


^  _  1  _  .8^ 
89  ~  2  ^89i}  ’ 
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With  this  definition, 


^  ^  1  /  d{6r  +  jOj)  .d{6r  + 

86  ~  2\  86,  ^  86i  ) 

1  /  86,  .  86i  .  86,  86 i  N 

2\W,'^^W,~ 

=  l(l  +  0-0  +  l) 

(295)  =  1 

and 

^  ^  1  (8{6,-j6i)  _  .8[6,-j6i)\ 

86  2  V  86,  ^  86i  ) 

86,  .86i  .86,  A 

=  i(l-O-O-l) 

(296)  =  0. 

Now,  we  can  consider  the  gradient  of  b^6  with  respect  to  a  vector  complex 

variable  6.  Let 

L  =  b^6 

(297)  =  f;  b'J,. 

X=1 
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Utilizing  the  definition  of  defined  in  Equation  294,  for  each  scalar  complex  vari¬ 
able  On  in  6,  we  know 


a  (Ell  (.:«.)  f 

dOn  ^  ''dOn 

h* 

^dOn 

(298) 

=  K- 

Therefore, 

(299) 

d(m) 

do 

Similarly,  let 

L  =  Fb 

(300) 

N 

=  E 

X=1 

From  the  previous  methodology,  we  know 

a(Eli9X)  ^  an 

90  n  h  ^90n 

h 

^dOn 

(301) 

=  0. 

Therefore, 

(302) 

a{m) 

do 
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Finally,  let 


(303) 


=  e^A0 

N  N 

=  EE«>«A- 

y=l  a;=l 


Again  with  the  established  methodology,  we  know 


d6„. 


(304) 


V'V'/'  a 


■55(-'-S) 

=  f  « 


X=1 


Therefore, 


(305) 


Consequently,  from  the  complex  conjugate  relationship 


(306) 


dL  _  /^y 
de*~[de)  ’ 
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we  know 


(307) 
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Appendix  C.  Theory  of  the  Cramer-Rao  Bound 


C.  1  Property  One 

Recall  from  the  first  property  of  the  theory  of  the  CRB  that  if  a  PDF,  f{y;  9), 
satisfies  the  “regularity”  condition,  then  the  error  covariance  matrix,  C,  of  any  un¬ 
biased  estimator,  g{y),  must  satisfy  the  condition  that  C  —  F{9)~^  is  positive  semi- 
definite.  The  matrix,  F{9),  is  the  Fisher  information  matrix  and  is  defined  as  the 
covariance  matrix  of  the  gradient  of  the  log-likelihood  function 


(308) 


F{e)  =  E 


d\nf{y-,e) 

d6 


d\nf{y\d) 

89 


For  the  difference  of  matrices,  C  —  F{9)~^,  to  be  positive  semi-definite,  each  element 
of  the  diagonal  of  C  must  be  greater  than  or  equal  to  each  element  of  the  diagonal 
of  F{9)~^.  This  implies 


(309)  =  E  {(5(y)„  -  9nMy)n  -  On)^}  >  F{9)-1 


We  now  prove  that,  with  the  “regularity”  condition  satisfied  and  the  existence 
of  an  unbiased  estimator  assumed,  the  difference  of  matrices,  C  —  F{9)~^,  must  be 
positive  semi-definite.  The  proof  follows  the  delivery  of  Scharf  (61:221-229). 

By  definition,  the  area  under  a  PDF  must  equal  1.  Therefore, 

/OO  _ 

fir,  0)dy  =  1 

-OO 


(310) 


and 


The  middle  expression  in  Equation  315  is  the  definition  of  the  expected  value  for 
the  gradient  of  the  log-likelihood  function.  This  implies  that  the  mean  vector  of  the 
gradient  of  the  log-likelihood  function  equals  0  or 


183 


With  the  mean  vector  equal  to  0,  the  covariance  matrix  of  the  gradient  of  the  log- 
likelihood  function  is 

The  matrix,  F{6),  is  the  Fisher  information  matrix.  Because  F{9)  is  con¬ 
structed  from  a  covariance  matrix  and  because  each  of  the  y{m]  elements  in  y  are 
stochastically  independent,  the  Fisher  information  matrix  is  positive  definite  (54:190). 
Thus  F{6)~^  exists.  Also,  by  construction,  F{6)  is  Hermitian. 

Continuing  the  proof,  when  we  assume  the  existence  of  an  unbiased  estimator, 
we  assume 


(318) 


E  {9(5)}  =  «■ 


Since  E{6}  =  0f{y\  9)dy  =  9,  we  can  say 


(319) 


E{g{y)}-E{9}  =  0 

E  {g{y)  -9^  =0 

=>  /~oo  {giy)  -  0)  f{y\  9)dy  =  0 

IZo  f{m (g{y)  -9)^  dy  =  o^. 
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With  the  “regularity”  condition  and  the  chain  rule,  we  know 


(320) 


m  I-oo  dy  = 

.00  9[f{y]6)(g{y)-6Y^ 

i-oo  a? 


L 

j°°  df{y]e) 

i: 


{9{y)-dY  +  f{m  0)  - 


df{y\0) 


89 


{g{y)-dY  f{y\9)dy-i 


QQ  {giy)  ~^Y  ^y~  J_^  f(y>  ^)^^y 

°°  din  f{y]  9) 

00 

H 


89 


89 


80^ 
89  ■ 


dy 


Therefore, 


(321) 


E 


din  f{y-,  9) 
89 


{9{y)-yY  \  "  ^ 


In  words,  the  cross-covariance  of  the  gradient  of  the  log-likelihood  function  and  the 
estimator  error  vector  is  equal  to  the  identity  matrix. 

Let  us  propose  a  vector  comprised  of  the  estimator  error  vector  stacked  over 
the  gradient  of  the  log-likelihood  function  represented  by 


(322) 


9{y)  -  d 


a  In  f{y,9) 
89 


2Nxl 


where  N  is  the  number  of  parameters  to  be  estimated.  With  the  assumptions  and 
derivations  just  presented,  we  know  the  mean  of  such  a  vector  is  0  and  the  covariance 
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matrix  is 


Qi  =  E{ 


9{y)  -  d 

9  In  f(y-,S) 

ae 


(m-ef 


(323) 


C  I 
I  F{e) 


Because  Qi  is  a  covariance  matrix,  it  is  at  least  positive  semi-definite  (54:190). 
Therefore,  for  any  vector  u, 


(324) 


u^Qiu  >  0. 


If  we  define  the  vector  u  =  Wv,  where  W  is  any  matrix  of  acceptable  dimensions, 
then  for  any  vector  v, 

(325)  v^W^QiWv  >  0. 


Therefore,  W^QiW  is  also  positive  semi-definite  (25:399). 
Let 


(326) 


W  = 


I  0 
-F(^)-i  I 
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so  that 


(327) 


<52  =  W^QxW 


I  -F(0)-i 

C  I 

I  0 

0  I 

I  F{9) 

-F{9)-'^  I 

C  -  F{6)-^  0 

0  F{6) 


and  Q2  is  positive  semi-definite. 

To  characterize  the  properties  of  the  C  —  F{6)~^  submatrix  of  Q2,  let  t;  be  a 
2N  X  1  vector  where  the  first  N  elements  are  arbitrary  and  the  last  N  elements  are 
zero.  Also,  let  be  an  A  x  1  vector  of  just  the  first  N  arbitrary  elements  of  v,  and 
let  Q2  equal  the  difference  of  matrices  C  —  F{6)~^.  With  these  definitions,  we  know 


(328) 


V^Q2V  =  V^Q2V  >  0. 


Since  the  elements  of  v  are  arbitrary,  Q2  —  C  —  F{6)~^  must  also  be  positive  semi- 
definite.  If  we  go  one  step  further  and  let  only  one  of  the  first  N  elements  of  v  and 
V  be  arbitrary  and  non-zero,  then 


(329) 


V^Q2V  =  V^Q2V  >  0 


implies  the  elements  on  the  diagonal  of  the  difference  of  matrices,  Q2  =  C  —  F{9)~^ 
must  be  greater  than  or  equal  to  zero.  Therefore,  the  elements  on  the  diagonal  of 
C  must  be  greater  than  or  equal  to  the  elements  on  the  diagonal  of  F(9)~^.  Thus 
proving  the  first  property  of  the  CRB. 
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C.2  Property  Two 


The  second  property  from  the  theory  of  the  CRB  states  that  an  unbiased 
estimator,  g{y),  may  be  found  that  attains  the  CRB,  in  that  C  =  if  and 

only  if 

(330)  SlnfM  ^ 

We  prove  the  if  portion  of  this  statement  by  assuming 

(331)  «h0iS  =  p(,-)(,(S)_,-). 

This  implies 

(332) 

=  m  (g(y)  -  S)  (j(y)  -  e^FiSr 
=  B{f(«)(s(!/)-«)(s(S)-«)"W} 

F{e)  =  F{e)CF{9)^ 

F{e)-^  =  c 

Recall  that  a  parameter  error  covariance  matrix  of  the  form 

(333)  -E  I  ((/(y )  -  ^)  (giV)  -  ^)  ^  } 


is  predicated  on  an  unbiased  estimator  defined  by 

(334)  B{s(S)}  =  «. 
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Therefore,  if 


(335)  ^  _ 

then  g{y)  is  the  unbiased  estimator  that  attains  C  =  F{9)~^. 

For  the  only  if  portion  of  the  proof,  assume  g{y)  exists  and  is  unbiased.  Recall 
that  when  an  estimator  is  unbiased, 

(336) 

This  implies 

(337)  E  {^^4^  (s(y)  -  «y  I  '=?  =  !. 

By  Schwartz’  inequality  for  random  variables  (61:227), 

(338)  s  j  ^ 

which  can  also  be  represented  as 

(339)  I  <  F{e)C. 

The  matrix  inequality  of  Equations  338  and  339  implies  that  the  difference  of  ma¬ 
trices,  F{6)C  —  I,  is  positive  semi-definite  and  transforms  to  an  equality  if  and  only 
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if 


(340)  = 

Thus  proving  the  second  property  from  the  theory  of  the  CRB. 
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Appendix  D.  Sequential  Quadratic  Programming  Gradients 


D.  1  Cost  Function  Gradient 

As  developed  in  Chapter  II,  the  exact  maximum  likelihood  cost  function  in 
terms  of  the  linear  prediction  coefficients,  b,  is 

(341)  L{b)  =  y^B^{BB^)-^By 

where 

bN  bisr-\  •  •  •  6o  0  •  •  •  0 

0  6jv_i  •  •  •  6o  •  •  : 

:  0 
0  •  •  •  0  fejv  •  •  •  bo 

Because  L  is  a  scalar  function,  the  gradient  of  L  with  respect  to  b  takes  the  form 

dL(b) 
dh 
dL(b) 
db2 

dL(b) 
dbN 

Let  5+  represent  the  pseudo-inverse  B^{BB^)~^  so  that 
(344)  L{b)  =  y^B+By 
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and 


(345) 


dL{b)  dL{y^B+By) 
db  ~  db 

=  y^^-m^y. 
^  db  ^ 


In  our  application,  we  programmed  the  SQP  algorithm  for  the  two  mode  prob¬ 
lem,  but  because  the  projection  matrix,  B'^B,  is  idempotent  and  Hermitian,  we  can 
use  the  chain  rule  to  express  each  partial  derivative  of  the  gradient,  in  a 

generalized  form.  This  insight  is  conveyed  by  Magnus  and  Neudecker  in  (49).  Since 


(346) 


B+B  =  B+BB+B 


then 


d{B+B)  _  d{B-^BB+B) 


db^ 


(347) 


dbn 


db 


dK 


dbn  \  dbn 


H 


Also  from  Equation  346 


(348) 


B 

dB  _  d(BB+B) 

dbn  ~ 


BB+B 


dB 

dbn 


dbn 

MB+B  +  saga, 
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Rearranging  Equation  348  we  see 


(349) 


86,  86,  86, 


Finally,  Equation  349  can  be  substituted  back  into  Equation  347  so  that 


(350) 


d{B+B) 

dbn 


Therefore,  calculating  the  gradient  of  the  log-likelihood  function,  L{b),  is  reduced 
to  determining  for  each  mode  and  inserting  it  back  into  Equations  350  and  345. 
With  B  defined  by  equation  342,  the  partial  derivative  of  B  with  respect  to  each 
is  the  identity  matrix  shifted  for  the  specific  n. 


D.2  Boundary  Gradients 

The  gradients  of  the  boundaries  of  the  feasibility  region  are  also  needed  for 
the  SQP  algorithm.  The  boundary  gradients  for  the  two  mode  problem  are  straight 
forward. 

D.2.1  Boundary  A. 


(351) 


GA{b) 

dGAjb) 

db 


—  —bn 


193 


D.2.2  Boundary  B. 


(352) 


Gj,ih)  =  -61-62-1 


dGA{h) 

dh 


-1 

-1 


D.2.3  Boundary  C. 


(353) 


GAib)  =  - 

-\bi+b 

dGAib) 

db 

1 
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